Skip to content

Conversation

DmitriyLewen
Copy link
Contributor

@DmitriyLewen DmitriyLewen commented Jul 3, 2025

Description

We currently use *.list files to detect files of dpkg packages.
But distroless images don't have this file (See #9046).

So we migrate from to **/info/*.md5sums (**/status.d/*.md5sums for distroless) files.

Example

before:

➜  trivy -q image -f json --list-all-pkgs --cache-backend memory gcr.io/distroless/nodejs20-debian12 | jq '.Results[].Packages[] | select(.Name=="base-files") | .InstalledFiles' 

null

after:

➜  ./trivy -q image -f json --list-all-pkgs --cache-backend memory gcr.io/distroless/nodejs20-debian12 | jq '.Results[].Packages[] | select(.Name=="base-files") | .InstalledFiles'

[
  "/usr/lib/os-release",
  "/usr/share/base-files/dot.bashrc",
  "/usr/share/base-files/dot.profile",
...
  "/usr/share/doc/base-files/README",
  "/usr/share/doc/base-files/README.FHS",
  "/usr/share/doc/base-files/changelog.gz",
  "/usr/share/doc/base-files/copyright",
  "/usr/share/lintian/overrides/base-files"
]

Related issues

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

@DmitriyLewen DmitriyLewen self-assigned this Jul 3, 2025
@DmitriyLewen DmitriyLewen changed the title refactor(dpkg): migrate from *.list to *.md5sums files refactor: migrate from *.list to *.md5sums files for dpkg Jul 3, 2025
@@ -70,7 +70,7 @@
"PkgName": "libidn2-0",
"PkgIdentifier": {
"PURL": "pkg:deb/debian/libidn2-0@2.0.5-1?arch=amd64\u0026distro=debian-10.1",
"UID": "473f5eb9e3d4a2f2"
"UID": "24f9b08969c58720"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated this case:
list file contains 2 files:

➜  docker run -it --rm --platform=linux/amd64 debian:12 cat var/lib/dpkg/info/libidn2-0\:amd64.list | grep libidn2.so.0
/usr/lib/x86_64-linux-gnu/libidn2.so.0.3.8
/usr/lib/x86_64-linux-gnu/libidn2.so.0

But md5sums file contains only one file:

➜  docker run -it --rm --platform=linux/amd64 debian:12 cat var/lib/dpkg/info/libidn2-0\:amd64.md5sums | grep libidn2.so.0
c745ba8b8dfd28a2aa7efb3081ca5eed  usr/lib/x86_64-linux-gnu/libidn2.so.0.3.8

libidn2.so.0 is link to libidn2.so.0.3.8 file:

➜  docker run -it --rm --platform=linux/amd64 debian:12 ls -hl /usr/lib/x86_64-linux-gnu | grep libidn2.so.0
lrwxrwxrwx  1 root root   16 Aug 28  2022 libidn2.so.0 -> libidn2.so.0.3.8
-rw-r--r--  1 root root 195K Aug 28  2022 libidn2.so.0.3.8

That is why md5sums doesn't have this file.


Trivy doesn't currently support links - #5356
So this shouldn't be a problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write a comment to the source code somewhere so that we can recall it when we add support for symlinks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 97a8340

@DmitriyLewen DmitriyLewen marked this pull request as ready for review July 3, 2025 12:16
@DmitriyLewen DmitriyLewen requested a review from knqyf263 as a code owner July 3, 2025 12:16
@DmitriyLewen DmitriyLewen requested a review from Copilot July 3, 2025 12:18
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the dpkg analyzer to read *.md5sums files instead of legacy *.list files, improving compatibility with distroless images.

  • Replace .list parsing logic with .md5sums parsing in code and tests
  • Update Required and isMd5SumsFile to detect only .md5sums files
  • Refresh testdata and golden outputs to use tar.md5sums and new UIDs

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/fanal/analyzer/pkg/dpkg/dpkg.go Switch parsing from .list to .md5sums, implement parseDpkgMd5sums, update file-detection logic
pkg/fanal/analyzer/pkg/dpkg/dpkg_test.go Adapt tests to .md5sums files, update expected installed files and test cases
pkg/fanal/analyzer/pkg/dpkg/testdata/tar.md5sums Add new md5sums-format testdata
pkg/fanal/analyzer/pkg/dpkg/testdata/tar.list Remove obsolete .list testdata
integration/testdata/debian-buster-ignore-unfixed.json.golden Update golden UID for package identifiers
Comments suppressed due to low confidence (2)

pkg/fanal/analyzer/pkg/dpkg/dpkg.go:127

  • [nitpick] Consider renaming the variable file to something like filePath for clarity, since it represents the extracted file path from the md5sums line.
		_, file, ok := strings.Cut(current, "  ")

pkg/fanal/analyzer/pkg/dpkg/dpkg.go:119

  • Add unit tests for malformed md5sums lines (e.g. missing delimiter) to verify that the parser returns the expected error.
func (a dpkgAnalyzer) parseDpkgMd5sums(scanner *bufio.Scanner) ([]string, error) {

Comment on lines +127 to 130
_, file, ok := strings.Cut(current, " ")
if !ok {
return nil, xerrors.Errorf("invalid md5sums line format: %s", current)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I was concerned that there might be cases where there are three spaces instead of two, or where tabs are used, so I thought it might be better to implement it in a way that wouldn’t be affected by such differences. However, if it’s guaranteed that it will always be two spaces, I think the current implementation is fine.

Suggested change
_, file, ok := strings.Cut(current, " ")
if !ok {
return nil, xerrors.Errorf("invalid md5sums line format: %s", current)
}
ss := strings.Fields(current)
if len(ss) != 2 {
return nil, xerrors.Errorf("invalid md5sums line format: %s", current)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even old versions (I checked it in Ubuntu 12.04) use this format
also the documentation clearly states about 2 spaces - https://man7.org/linux/man-pages/man5/deb-md5sums.5.html
so I think we can leave it like this and fix it if there is feedback from users (analyze their case first)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it might be safer to make the change, as it would handle more cases unless the current code has a clear advantage in terms of readability or lines of code. But we can leave it as I don't stick to that.

@@ -70,7 +70,7 @@
"PkgName": "libidn2-0",
"PkgIdentifier": {
"PURL": "pkg:deb/debian/libidn2-0@2.0.5-1?arch=amd64\u0026distro=debian-10.1",
"UID": "473f5eb9e3d4a2f2"
"UID": "24f9b08969c58720"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write a comment to the source code somewhere so that we can recall it when we add support for symlinks?

Copy link
Collaborator

@knqyf263 knqyf263 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this affects Trivy's output (especially Distroless), it's not refactoring. The prefix should be feat or fix.

@DmitriyLewen DmitriyLewen changed the title refactor: migrate from *.list to *.md5sums files for dpkg fix: migrate from *.list to *.md5sums files for dpkg Jul 4, 2025
@DmitriyLewen
Copy link
Contributor Author

yes. Thanks. Updated
I always hesitate things like this:
this is a fix for distroless, but the changes (and the PR name) are not directly related to this fix.

@DmitriyLewen DmitriyLewen added this pull request to the merge queue Jul 4, 2025
Merged via the queue into aquasecurity:main with commit f224de3 Jul 4, 2025
13 checks passed
@DmitriyLewen DmitriyLewen deleted the refactor/dpkg/migrate-to-md5sums branch July 4, 2025 08:54
@aqua-bot aqua-bot mentioned this pull request Jul 4, 2025
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Jul 31, 2025
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [mirror.gcr.io/aquasec/trivy](https://www.aquasec.com/products/trivy/) ([source](https://github.com/aquasecurity/trivy)) | minor | `0.64.1` -> `0.65.0` |

---

### Release Notes

<details>
<summary>aquasecurity/trivy (mirror.gcr.io/aquasec/trivy)</summary>

### [`v0.65.0`](https://github.com/aquasecurity/trivy/blob/HEAD/CHANGELOG.md#0650-2025-07-30)

[Compare Source](aquasecurity/trivy@v0.64.1...v0.65.0)

##### Features

- add graceful shutdown with signal handling ([#&#8203;9242](aquasecurity/trivy#9242)) ([2c05882](aquasecurity/trivy@2c05882))
- add HTTP request/response tracing support ([#&#8203;9125](aquasecurity/trivy#9125)) ([aa5b32a](aquasecurity/trivy@aa5b32a))
- **alma:** add AlmaLinux 10 support ([#&#8203;9207](aquasecurity/trivy#9207)) ([861d51e](aquasecurity/trivy@861d51e))
- **flag:** add schema validation for `--server` flag ([#&#8203;9270](aquasecurity/trivy#9270)) ([ed4640e](aquasecurity/trivy@ed4640e))
- **image:** add Docker context resolution ([#&#8203;9166](aquasecurity/trivy#9166)) ([99cd4e7](aquasecurity/trivy@99cd4e7))
- **license:** observe pkg types option in license scanner ([#&#8203;9091](aquasecurity/trivy#9091)) ([d44af8c](aquasecurity/trivy@d44af8c))
- **misconf:** add private ip google access attribute to subnetwork ([#&#8203;9199](aquasecurity/trivy#9199)) ([263845c](aquasecurity/trivy@263845c))
- **misconf:** added logging and versioning to the gcp storage bucket ([#&#8203;9226](aquasecurity/trivy#9226)) ([110f80e](aquasecurity/trivy@110f80e))
- **repo:** add git repository metadata to reports ([#&#8203;9252](aquasecurity/trivy#9252)) ([f4b2cf1](aquasecurity/trivy@f4b2cf1))
- **report:** add CVSS vectors in sarif report ([#&#8203;9157](aquasecurity/trivy#9157)) ([60723e6](aquasecurity/trivy@60723e6))
- **sbom:** add SHA-512 hash support for CycloneDX SBOM ([#&#8203;9126](aquasecurity/trivy#9126)) ([12d6706](aquasecurity/trivy@12d6706))

##### Bug Fixes

- **alma:** parse epochs from rpmqa file ([#&#8203;9101](aquasecurity/trivy#9101)) ([82db2fc](aquasecurity/trivy@82db2fc))
- also check `filepath` when removing duplicate packages ([#&#8203;9142](aquasecurity/trivy#9142)) ([4d10a81](aquasecurity/trivy@4d10a81))
- **aws:** update amazon linux 2 EOL date ([#&#8203;9176](aquasecurity/trivy#9176)) ([0ecfed6](aquasecurity/trivy@0ecfed6))
- **cli:** Add more non-sensitive flags to telemetry ([#&#8203;9110](aquasecurity/trivy#9110)) ([7041a39](aquasecurity/trivy@7041a39))
- **cli:** ensure correct command is picked by telemetry ([#&#8203;9260](aquasecurity/trivy#9260)) ([b4ad00f](aquasecurity/trivy@b4ad00f))
- **cli:** panic: attempt to get os.Args\[1] when len(os.Args) < 2 ([#&#8203;9206](aquasecurity/trivy#9206)) ([adfa879](aquasecurity/trivy@adfa879))
- **license:** add missed `GFDL-NIV-1.1` and `GFDL-NIV-1.2` into Trivy mapping ([#&#8203;9116](aquasecurity/trivy#9116)) ([a692f29](aquasecurity/trivy@a692f29))
- **license:** handle WITH operator for `LaxSplitLicenses` ([#&#8203;9232](aquasecurity/trivy#9232)) ([b4193d0](aquasecurity/trivy@b4193d0))
- migrate from `*.list` to `*.md5sums` files for `dpkg` ([#&#8203;9131](aquasecurity/trivy#9131)) ([f224de3](aquasecurity/trivy@f224de3))
- **misconf:** correctly adapt azure storage account ([#&#8203;9138](aquasecurity/trivy#9138)) ([51aa022](aquasecurity/trivy@51aa022))
- **misconf:** correctly parse empty port ranges in google\_compute\_firewall ([#&#8203;9237](aquasecurity/trivy#9237)) ([77bab7b](aquasecurity/trivy@77bab7b))
- **misconf:** fix log bucket in schema ([#&#8203;9235](aquasecurity/trivy#9235)) ([7ebc129](aquasecurity/trivy@7ebc129))
- **misconf:** skip rewriting expr if attr is nil ([#&#8203;9113](aquasecurity/trivy#9113)) ([42ccd3d](aquasecurity/trivy@42ccd3d))
- **nodejs:** don't use prerelease logic for compare npm constraints  ([#&#8203;9208](aquasecurity/trivy#9208)) ([fe96436](aquasecurity/trivy@fe96436))
- prevent graceful shutdown message on normal exit ([#&#8203;9244](aquasecurity/trivy#9244)) ([6095984](aquasecurity/trivy@6095984))
- **rootio:** check full version to detect `root.io` packages ([#&#8203;9117](aquasecurity/trivy#9117)) ([c2ddd44](aquasecurity/trivy@c2ddd44))
- **rootio:** fix severity selection ([#&#8203;9181](aquasecurity/trivy#9181)) ([6fafbeb](aquasecurity/trivy@6fafbeb))
- **sbom:** merge in-graph and out-of-graph OS packages in scan results ([#&#8203;9194](aquasecurity/trivy#9194)) ([aa944cc](aquasecurity/trivy@aa944cc))
- **sbom:** use correct field for licenses in CycloneDX reports ([#&#8203;9057](aquasecurity/trivy#9057)) ([143da88](aquasecurity/trivy@143da88))
- **secret:** add UTF-8 validation in secret scanner to prevent protobuf marshalling errors ([#&#8203;9253](aquasecurity/trivy#9253)) ([54832a7](aquasecurity/trivy@54832a7))
- **secret:** fix line numbers for multiple-line secrets ([#&#8203;9104](aquasecurity/trivy#9104)) ([e579746](aquasecurity/trivy@e579746))
- **server:** add HTTP transport setup to server mode ([#&#8203;9217](aquasecurity/trivy#9217)) ([1163b04](aquasecurity/trivy@1163b04))
- supporting .egg-info/METADATA in python.Packaging analyzer ([#&#8203;9151](aquasecurity/trivy#9151)) ([e306e2d](aquasecurity/trivy@e306e2d))
- **terraform:** `for_each` on a map returns a resource for every key ([#&#8203;9156](aquasecurity/trivy#9156)) ([153318f](aquasecurity/trivy@153318f))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xLjMiLCJ1cGRhdGVkSW5WZXIiOiI0MS4xLjMiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbImltYWdlIl19-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1073
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
yutatokoi pushed a commit to yutatokoi/trivy that referenced this pull request Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(dpkg): check /var/lib/dpkg/*/<package>.md5sums to find list of system files
2 participants