-
Notifications
You must be signed in to change notification settings - Fork 2.7k
fix: migrate from *.list
to *.md5sums
files for dpkg
#9131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: migrate from *.list
to *.md5sums
files for dpkg
#9131
Conversation
*.list
to *.md5sums
files*.list
to *.md5sums
files for dpkg
@@ -70,7 +70,7 @@ | |||
"PkgName": "libidn2-0", | |||
"PkgIdentifier": { | |||
"PURL": "pkg:deb/debian/libidn2-0@2.0.5-1?arch=amd64\u0026distro=debian-10.1", | |||
"UID": "473f5eb9e3d4a2f2" | |||
"UID": "24f9b08969c58720" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I investigated this case:
list
file contains 2 files:
➜ docker run -it --rm --platform=linux/amd64 debian:12 cat var/lib/dpkg/info/libidn2-0\:amd64.list | grep libidn2.so.0
/usr/lib/x86_64-linux-gnu/libidn2.so.0.3.8
/usr/lib/x86_64-linux-gnu/libidn2.so.0
But md5sums
file contains only one file:
➜ docker run -it --rm --platform=linux/amd64 debian:12 cat var/lib/dpkg/info/libidn2-0\:amd64.md5sums | grep libidn2.so.0
c745ba8b8dfd28a2aa7efb3081ca5eed usr/lib/x86_64-linux-gnu/libidn2.so.0.3.8
libidn2.so.0
is link to libidn2.so.0.3.8
file:
➜ docker run -it --rm --platform=linux/amd64 debian:12 ls -hl /usr/lib/x86_64-linux-gnu | grep libidn2.so.0
lrwxrwxrwx 1 root root 16 Aug 28 2022 libidn2.so.0 -> libidn2.so.0.3.8
-rw-r--r-- 1 root root 195K Aug 28 2022 libidn2.so.0.3.8
That is why md5sums
doesn't have this file.
Trivy doesn't currently support links - #5356
So this shouldn't be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write a comment to the source code somewhere so that we can recall it when we add support for symlinks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in 97a8340
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the dpkg analyzer to read *.md5sums
files instead of legacy *.list
files, improving compatibility with distroless images.
- Replace
.list
parsing logic with.md5sums
parsing in code and tests - Update
Required
andisMd5SumsFile
to detect only.md5sums
files - Refresh testdata and golden outputs to use
tar.md5sums
and new UIDs
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
pkg/fanal/analyzer/pkg/dpkg/dpkg.go | Switch parsing from .list to .md5sums , implement parseDpkgMd5sums , update file-detection logic |
pkg/fanal/analyzer/pkg/dpkg/dpkg_test.go | Adapt tests to .md5sums files, update expected installed files and test cases |
pkg/fanal/analyzer/pkg/dpkg/testdata/tar.md5sums | Add new md5sums-format testdata |
pkg/fanal/analyzer/pkg/dpkg/testdata/tar.list | Remove obsolete .list testdata |
integration/testdata/debian-buster-ignore-unfixed.json.golden | Update golden UID for package identifiers |
Comments suppressed due to low confidence (2)
pkg/fanal/analyzer/pkg/dpkg/dpkg.go:127
- [nitpick] Consider renaming the variable
file
to something likefilePath
for clarity, since it represents the extracted file path from the md5sums line.
_, file, ok := strings.Cut(current, " ")
pkg/fanal/analyzer/pkg/dpkg/dpkg.go:119
- Add unit tests for malformed md5sums lines (e.g. missing delimiter) to verify that the parser returns the expected error.
func (a dpkgAnalyzer) parseDpkgMd5sums(scanner *bufio.Scanner) ([]string, error) {
_, file, ok := strings.Cut(current, " ") | ||
if !ok { | ||
return nil, xerrors.Errorf("invalid md5sums line format: %s", current) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I was concerned that there might be cases where there are three spaces instead of two, or where tabs are used, so I thought it might be better to implement it in a way that wouldn’t be affected by such differences. However, if it’s guaranteed that it will always be two spaces, I think the current implementation is fine.
_, file, ok := strings.Cut(current, " ") | |
if !ok { | |
return nil, xerrors.Errorf("invalid md5sums line format: %s", current) | |
} | |
ss := strings.Fields(current) | |
if len(ss) != 2 { | |
return nil, xerrors.Errorf("invalid md5sums line format: %s", current) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even old versions (I checked it in Ubuntu 12.04) use this format
also the documentation clearly states about 2 spaces - https://man7.org/linux/man-pages/man5/deb-md5sums.5.html
so I think we can leave it like this and fix it if there is feedback from users (analyze their case first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it might be safer to make the change, as it would handle more cases unless the current code has a clear advantage in terms of readability or lines of code. But we can leave it as I don't stick to that.
@@ -70,7 +70,7 @@ | |||
"PkgName": "libidn2-0", | |||
"PkgIdentifier": { | |||
"PURL": "pkg:deb/debian/libidn2-0@2.0.5-1?arch=amd64\u0026distro=debian-10.1", | |||
"UID": "473f5eb9e3d4a2f2" | |||
"UID": "24f9b08969c58720" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write a comment to the source code somewhere so that we can recall it when we add support for symlinks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this affects Trivy's output (especially Distroless), it's not refactoring. The prefix should be feat
or fix
.
*.list
to *.md5sums
files for dpkg
*.list
to *.md5sums
files for dpkg
yes. Thanks. Updated |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [mirror.gcr.io/aquasec/trivy](https://www.aquasec.com/products/trivy/) ([source](https://github.com/aquasecurity/trivy)) | minor | `0.64.1` -> `0.65.0` | --- ### Release Notes <details> <summary>aquasecurity/trivy (mirror.gcr.io/aquasec/trivy)</summary> ### [`v0.65.0`](https://github.com/aquasecurity/trivy/blob/HEAD/CHANGELOG.md#0650-2025-07-30) [Compare Source](aquasecurity/trivy@v0.64.1...v0.65.0) ##### Features - add graceful shutdown with signal handling ([#​9242](aquasecurity/trivy#9242)) ([2c05882](aquasecurity/trivy@2c05882)) - add HTTP request/response tracing support ([#​9125](aquasecurity/trivy#9125)) ([aa5b32a](aquasecurity/trivy@aa5b32a)) - **alma:** add AlmaLinux 10 support ([#​9207](aquasecurity/trivy#9207)) ([861d51e](aquasecurity/trivy@861d51e)) - **flag:** add schema validation for `--server` flag ([#​9270](aquasecurity/trivy#9270)) ([ed4640e](aquasecurity/trivy@ed4640e)) - **image:** add Docker context resolution ([#​9166](aquasecurity/trivy#9166)) ([99cd4e7](aquasecurity/trivy@99cd4e7)) - **license:** observe pkg types option in license scanner ([#​9091](aquasecurity/trivy#9091)) ([d44af8c](aquasecurity/trivy@d44af8c)) - **misconf:** add private ip google access attribute to subnetwork ([#​9199](aquasecurity/trivy#9199)) ([263845c](aquasecurity/trivy@263845c)) - **misconf:** added logging and versioning to the gcp storage bucket ([#​9226](aquasecurity/trivy#9226)) ([110f80e](aquasecurity/trivy@110f80e)) - **repo:** add git repository metadata to reports ([#​9252](aquasecurity/trivy#9252)) ([f4b2cf1](aquasecurity/trivy@f4b2cf1)) - **report:** add CVSS vectors in sarif report ([#​9157](aquasecurity/trivy#9157)) ([60723e6](aquasecurity/trivy@60723e6)) - **sbom:** add SHA-512 hash support for CycloneDX SBOM ([#​9126](aquasecurity/trivy#9126)) ([12d6706](aquasecurity/trivy@12d6706)) ##### Bug Fixes - **alma:** parse epochs from rpmqa file ([#​9101](aquasecurity/trivy#9101)) ([82db2fc](aquasecurity/trivy@82db2fc)) - also check `filepath` when removing duplicate packages ([#​9142](aquasecurity/trivy#9142)) ([4d10a81](aquasecurity/trivy@4d10a81)) - **aws:** update amazon linux 2 EOL date ([#​9176](aquasecurity/trivy#9176)) ([0ecfed6](aquasecurity/trivy@0ecfed6)) - **cli:** Add more non-sensitive flags to telemetry ([#​9110](aquasecurity/trivy#9110)) ([7041a39](aquasecurity/trivy@7041a39)) - **cli:** ensure correct command is picked by telemetry ([#​9260](aquasecurity/trivy#9260)) ([b4ad00f](aquasecurity/trivy@b4ad00f)) - **cli:** panic: attempt to get os.Args\[1] when len(os.Args) < 2 ([#​9206](aquasecurity/trivy#9206)) ([adfa879](aquasecurity/trivy@adfa879)) - **license:** add missed `GFDL-NIV-1.1` and `GFDL-NIV-1.2` into Trivy mapping ([#​9116](aquasecurity/trivy#9116)) ([a692f29](aquasecurity/trivy@a692f29)) - **license:** handle WITH operator for `LaxSplitLicenses` ([#​9232](aquasecurity/trivy#9232)) ([b4193d0](aquasecurity/trivy@b4193d0)) - migrate from `*.list` to `*.md5sums` files for `dpkg` ([#​9131](aquasecurity/trivy#9131)) ([f224de3](aquasecurity/trivy@f224de3)) - **misconf:** correctly adapt azure storage account ([#​9138](aquasecurity/trivy#9138)) ([51aa022](aquasecurity/trivy@51aa022)) - **misconf:** correctly parse empty port ranges in google\_compute\_firewall ([#​9237](aquasecurity/trivy#9237)) ([77bab7b](aquasecurity/trivy@77bab7b)) - **misconf:** fix log bucket in schema ([#​9235](aquasecurity/trivy#9235)) ([7ebc129](aquasecurity/trivy@7ebc129)) - **misconf:** skip rewriting expr if attr is nil ([#​9113](aquasecurity/trivy#9113)) ([42ccd3d](aquasecurity/trivy@42ccd3d)) - **nodejs:** don't use prerelease logic for compare npm constraints ([#​9208](aquasecurity/trivy#9208)) ([fe96436](aquasecurity/trivy@fe96436)) - prevent graceful shutdown message on normal exit ([#​9244](aquasecurity/trivy#9244)) ([6095984](aquasecurity/trivy@6095984)) - **rootio:** check full version to detect `root.io` packages ([#​9117](aquasecurity/trivy#9117)) ([c2ddd44](aquasecurity/trivy@c2ddd44)) - **rootio:** fix severity selection ([#​9181](aquasecurity/trivy#9181)) ([6fafbeb](aquasecurity/trivy@6fafbeb)) - **sbom:** merge in-graph and out-of-graph OS packages in scan results ([#​9194](aquasecurity/trivy#9194)) ([aa944cc](aquasecurity/trivy@aa944cc)) - **sbom:** use correct field for licenses in CycloneDX reports ([#​9057](aquasecurity/trivy#9057)) ([143da88](aquasecurity/trivy@143da88)) - **secret:** add UTF-8 validation in secret scanner to prevent protobuf marshalling errors ([#​9253](aquasecurity/trivy#9253)) ([54832a7](aquasecurity/trivy@54832a7)) - **secret:** fix line numbers for multiple-line secrets ([#​9104](aquasecurity/trivy#9104)) ([e579746](aquasecurity/trivy@e579746)) - **server:** add HTTP transport setup to server mode ([#​9217](aquasecurity/trivy#9217)) ([1163b04](aquasecurity/trivy@1163b04)) - supporting .egg-info/METADATA in python.Packaging analyzer ([#​9151](aquasecurity/trivy#9151)) ([e306e2d](aquasecurity/trivy@e306e2d)) - **terraform:** `for_each` on a map returns a resource for every key ([#​9156](aquasecurity/trivy#9156)) ([153318f](aquasecurity/trivy@153318f)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xLjMiLCJ1cGRhdGVkSW5WZXIiOiI0MS4xLjMiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbImltYWdlIl19--> Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1073 Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net> Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
Description
We currently use
*.list
files to detect files ofdpkg
packages.But
distroless
images don't have this file (See #9046).So we migrate from to
**/info/*.md5sums
(**/status.d/*.md5sums
for distroless) files.Example
before:
after:
Related issues
/var/lib/dpkg/*/<package>.md5sums
to find list of system files #9046Checklist