Skip to content

Conversation

knqyf263
Copy link
Collaborator

@knqyf263 knqyf263 commented Jul 28, 2025

Description

This PR adds git repository metadata to Trivy scan reports, enabling detailed repository information in both local and remote git repository scans. This enhancement provides users with valuable context about the scanned repository, including commit history, authorship, and version information.

Key Features

  1. Git Metadata Extraction

    • Automatically detects and extracts git repository information during scans
    • Works with both local filesystem scans and remote repository scans
    • Gracefully handles non-git directories without errors
  2. Comprehensive Metadata Collection

    • Repository URL (supports both origin and upstream remotes)
    • Current branch name
    • All tags pointing to the current commit (supports multiple tags)
    • Commit hash and message
    • Author and committer information
  3. Universal Git Detection

    • Git metadata is now collected for any directory that is a git repository
    • Not limited to explicit repository scans (trivy repo)
    • Works seamlessly with filesystem scans (trivy fs) when scanning git directories

Implementation Details

  • Non-intrusive Design: The feature automatically detects git repositories without requiring user configuration
  • Performance Optimized: Git operations are performed once during artifact initialization to avoid redundant calls
  • Error Resilient: Handles various edge cases including dirty repos, missing remotes, and non-git directories
  • Backwards Compatible: Uses omitzero JSON tags to ensure empty metadata doesn't clutter reports

Example Output

Remote Repository Scan

$ trivy repo --format json github.com/knqyf263/sou
{
  "Metadata": {
    "RepoURL": "https://github.com/knqyf263/sou",
    "Branch": "main",
    "Tags": ["v0.2.0", "latest"],
    "Commit": "378cf9606fe23bdb47639e29a4fb525ed7645e09",
    "CommitMsg": "Migrate logging to structured logging with slog...",
    "Author": "knqyf263 <knqyf263@gmail.com>",
    "Committer": "knqyf263 <knqyf263@gmail.com>"
  }
}

Related Issues

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change)

knqyf263 added 6 commits July 24, 2025 18:01
- Add RepoMetadata struct with repository URL, branch, tag, commit info
- Extract git metadata for repository artifacts in local filesystem scanner
- Include git metadata fields in scan report output
- Support both local and remote git repositories
- Combine gitCommitHash() and extractGitMetadata() into single extractGitInfo() function
- Store git metadata in Artifact struct during construction to avoid duplicate git operations
- Add comprehensive unit tests for git metadata extraction functionality
- Test scenarios: clean/dirty repos, upstream/origin remotes, tagged commits, non-git directories
…ommitHash field

- Replace redundant `commitHash string` field with `isClean bool` in Artifact struct
- Update extractGitInfo function to return (bool, RepoMetadata, error) instead of (string, RepoMetadata, error)
- Separate concerns: isClean for cache decisions, repoMetadata.Commit for actual hash value
- Update cache logic to use `a.isClean && a.repoMetadata.Commit \!= ""` pattern
- Update TestExtractGitInfo to use wantClean instead of wantHash
- Eliminate code duplication while maintaining all existing functionality
… omitzero tags

This change removes the type check that only populated RepoMetadata for
TypeRepository artifacts. Now git metadata is populated for any directory
that happens to be a git repository, regardless of scan type.

Also updates JSON struct tags from omitempty to omitzero to better handle
empty git metadata fields in reports.
…ository infrastructure

- Replace programmatic git repository creation with existing test-repo
- Simplify test cases from 5 complex scenarios to 2 focused scenarios
- Use internal/gittest/testdata/test-repo as recommended
- Remove unnecessary imports (time, go-git packages)
- Fix compilation errors with boolean return values in extractGitInfo
@knqyf263 knqyf263 self-assigned this Jul 28, 2025
@knqyf263 knqyf263 changed the title feat: add comprehensive git repository metadata to trivy reports feat: add git repository metadata to reports Jul 28, 2025
@knqyf263 knqyf263 changed the title feat: add git repository metadata to reports feat(repo): add git repository metadata to reports Jul 28, 2025
knqyf263 added 2 commits July 28, 2025 13:41
- Fix gci formatting in RepoMetadata struct field alignment
- Update integration test golden file to include git metadata fields
knqyf263 added 6 commits July 28, 2025 18:19
The multiple_lockfiles test scans a local directory without git information,
so it should not expect git metadata in the results. Added an override
function to clear the metadata for this specific test case.

This resolves the test failure where the golden file was updated with git
metadata for a different test (TestClientServer/scan_remote_repository),
causing conflicts when both tests share the same golden file.
Changed Tag field to Tags []string in both RepoMetadata and Metadata structs
to properly handle cases where multiple tags point to the same commit.

- Updated artifact.RepoMetadata to use Tags []string
- Updated types.Metadata to use Tags []string
- Modified extractGitInfo to collect all tags pointing to HEAD
- Updated tests to handle the new Tags field
- Updated service.go to pass Tags array directly
Updated TestArtifact_Inspect test expectations to include the RepoMetadata
that is now populated for git repositories. The tests were failing because
they expected empty metadata, but our implementation now extracts git
information for all repository scans.

Updated test cases:
- remote_repo: expects metadata from cloned remote repository
- local_repo: expects metadata from local test repository
- dirty_repository: expects metadata even when repository has uncommitted changes
Removed TestExtractGitInfo from fs_test.go as it's redundant.
The git repository functionality is properly tested in
pkg/fanal/artifact/repo/git_test.go.
- Use direct assignment with multiple return values
- Consolidate artifact.Reference initialization
- Remove intermediate variable for metadata assignment
- Consolidate error handling for repo.Tags() into a single if statement
- Explicitly ignore the return value of tags.ForEach()
- Fix variable reference in debug log message
@knqyf263 knqyf263 added the autoready Automatically mark PR as ready for review when all checks pass label Jul 29, 2025
- Add brief explanation of automatic git metadata extraction
- List types of information included without implementation details
- Direct users to JSON output for detailed field information
- Keep documentation flexible for future changes
@github-actions github-actions bot marked this pull request as ready for review July 29, 2025 09:03
@github-actions github-actions bot removed the autoready Automatically mark PR as ready for review when all checks pass label Jul 29, 2025
@github-actions github-actions bot requested a review from DmitriyLewen as a code owner July 29, 2025 09:03
Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to check client/server mode, but it looks like we have bug (from #8278):

➜  trivy -d repo --server 0.0.0.0:8080 --format json github.com/knqyf263/sou 
2025-07-29T15:49:24+06:00       DEBUG   Default config file "file_path=trivy.yaml" not found, using built in values
2025-07-29T15:49:24+06:00       DEBUG   Cache dir       dir="/Users/dmitriy/Library/Caches/trivy"
2025-07-29T15:49:24+06:00       DEBUG   Cache dir       dir="/Users/dmitriy/Library/Caches/trivy"
2025-07-29T15:49:24+06:00       DEBUG   Parsed severities       severities=[UNKNOWN LOW MEDIUM HIGH CRITICAL]
2025-07-29T15:49:24+06:00       DEBUG   Ignore statuses statuses=[]
2025-07-29T15:49:24+06:00       WARN    Trivy runs in client/server mode, but misconfiguration and license scanning will be done on the client side, see https://trivy.dev/v0.64/docs/references/modes/client-server
2025-07-29T15:49:24+06:00       DEBUG   [pkg] Package types     types=[library]
2025-07-29T15:49:24+06:00       DEBUG   [pkg] Package relationships     relationships=[unknown root workspace direct indirect]
2025-07-29T15:49:24+06:00       INFO    [vuln] Vulnerability scanning is enabled
2025-07-29T15:49:24+06:00       INFO    [secret] Secret scanning is enabled
2025-07-29T15:49:24+06:00       INFO    [secret] If your scanning is slow, please try '--scanners vuln' to disable secret scanning
2025-07-29T15:49:24+06:00       INFO    [secret] Please see also https://trivy.dev/v0.64/docs/scanner/secret#recommendation for faster secret detection
2025-07-29T15:49:24+06:00       DEBUG   [notification] Running version check
2025-07-29T15:49:24+06:00       DEBUG   [notification] Version check completed  latest_version="0.64.1"
Enumerating objects: 39, done.
Counting objects: 100% (39/39), done.
Compressing objects: 100% (34/34), done.
Total 39 (delta 6), reused 24 (delta 3), pack-reused 0 (from 0)
2025-07-29T15:49:31+06:00       DEBUG   [secret] No secret config detected      config_path="trivy-secret.yaml"
2025-07-29T15:49:31+06:00       DEBUG   [repo] Analyzing...     root="/var/folders/8m/p1341n2941jbyc5gm7357x5c0000gn/T/trivy-remote-repo3088390424" original="github.com/knqyf263/sou"
2025-07-29T15:49:31+06:00       DEBUG   [repo] Using the latest commit hash for calculating cache key   commit_hash="378cf9606fe23bdb47639e29a4fb525ed7645e09"
2025-07-29T15:49:31+06:00       FATAL   Fatal error
  - run error:
    github.com/aquasecurity/trivy/pkg/commands/artifact.Run
        github.com/aquasecurity/trivy/pkg/commands/artifact/run.go:411
  - repo scan error:
    github.com/aquasecurity/trivy/pkg/commands/artifact.run
        github.com/aquasecurity/trivy/pkg/commands/artifact/run.go:449
  - scan error:
    github.com/aquasecurity/trivy/pkg/commands/artifact.(*runner).scanArtifact
        github.com/aquasecurity/trivy/pkg/commands/artifact/run.go:288
  - scan failed:
    github.com/aquasecurity/trivy/pkg/commands/artifact.(*runner).scan
        github.com/aquasecurity/trivy/pkg/commands/artifact/run.go:678
  - failed analysis:
    github.com/aquasecurity/trivy/pkg/scan.Service.ScanArtifact
        github.com/aquasecurity/trivy/pkg/scan/service.go:166
  - unable to get missing blob:
    github.com/aquasecurity/trivy/pkg/fanal/artifact/local.Artifact.Inspect
        github.com/aquasecurity/trivy/pkg/fanal/artifact/local/fs.go:144
  - unable to fetch missing layers:
    github.com/aquasecurity/trivy/pkg/cache.RemoteCache.MissingBlobs
        github.com/aquasecurity/trivy/pkg/cache/remote.go:81
  - twirp error internal: could not build request: parse "0.0.0.0:8080/twirp/trivy.cache.v1.Cache/MissingBlobs": first path segment in URL cannot contain colon

Co-authored-by: DmitriyLewen <91113035+DmitriyLewen@users.noreply.github.com>
@knqyf263
Copy link
Collaborator Author

@DmitriyLewen Thanks for catching the bug! Do you think we should fix it in this PR?

@DmitriyLewen
Copy link
Contributor

yes, no problem, we can fix it in another PR since the bug is not related to these changes.

@knqyf263
Copy link
Collaborator Author

You may have missed http:// in --server. It works for me.

$ trivy -d repo --server http://localhost:4954 --format json github.com/knqyf263/sou
{
  "SchemaVersion": 2,
  "CreatedAt": "2025-07-29T15:45:06.095541+04:00",
  "ArtifactName": "github.com/knqyf263/sou",
  "ArtifactType": "repository",
  "Metadata": {
    "ImageConfig": {
      "architecture": "",
      "created": "0001-01-01T00:00:00Z",
      "os": "",
      "rootfs": {
        "type": "",
        "diff_ids": null
      },
      "config": {}
    },
    "RepoURL": "https://github.com/knqyf263/sou",
    "Branch": "main",
    "Tags": [
      "v0.2.0"
    ],
    "Commit": "378cf9606fe23bdb47639e29a4fb525ed7645e09",
    "CommitMsg": "Migrate logging to structured logging with slog\n\n- Replace legacy log package with Go's standard library slog\n- Configure centralized logging with JSON output to a cache directory\n- Remove manual log file initialization in multiple packages\n- Enhance error handling and logging across the application\n- Standardize debug logging with slog methods",
    "Author": "knqyf263 \u003cknqyf263@gmail.com\u003e",
    "Committer": "knqyf263 \u003cknqyf263@gmail.com\u003e"
  },
  "Results": [
    {
      "Target": "go.mod",
      "Class": "lang-pkgs",
      "Type": "gomod"
    }
  ]
}

@knqyf263
Copy link
Collaborator Author

It's documented. We may want to show a more user-friendly message when the schema is missing.
https://trivy.dev/latest/docs/references/modes/client-server/

@DmitriyLewen
Copy link
Contributor

DmitriyLewen commented Jul 29, 2025

I became kind of absent-minded...
thanks for finding the typo...

@knqyf263
Copy link
Collaborator Author

No worries. Even maintainers can make mistakes with --server, which means users will not notice the issue—this presents a great opportunity to improve the UX. I'll take a look.

@knqyf263 knqyf263 added this pull request to the merge queue Jul 29, 2025
Merged via the queue into aquasecurity:main with commit f4b2cf1 Jul 29, 2025
18 checks passed
@knqyf263 knqyf263 deleted the feat/git-repo-metadata branch July 29, 2025 12:08
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Jul 31, 2025
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [mirror.gcr.io/aquasec/trivy](https://www.aquasec.com/products/trivy/) ([source](https://github.com/aquasecurity/trivy)) | minor | `0.64.1` -> `0.65.0` |

---

### Release Notes

<details>
<summary>aquasecurity/trivy (mirror.gcr.io/aquasec/trivy)</summary>

### [`v0.65.0`](https://github.com/aquasecurity/trivy/blob/HEAD/CHANGELOG.md#0650-2025-07-30)

[Compare Source](aquasecurity/trivy@v0.64.1...v0.65.0)

##### Features

- add graceful shutdown with signal handling ([#&#8203;9242](aquasecurity/trivy#9242)) ([2c05882](aquasecurity/trivy@2c05882))
- add HTTP request/response tracing support ([#&#8203;9125](aquasecurity/trivy#9125)) ([aa5b32a](aquasecurity/trivy@aa5b32a))
- **alma:** add AlmaLinux 10 support ([#&#8203;9207](aquasecurity/trivy#9207)) ([861d51e](aquasecurity/trivy@861d51e))
- **flag:** add schema validation for `--server` flag ([#&#8203;9270](aquasecurity/trivy#9270)) ([ed4640e](aquasecurity/trivy@ed4640e))
- **image:** add Docker context resolution ([#&#8203;9166](aquasecurity/trivy#9166)) ([99cd4e7](aquasecurity/trivy@99cd4e7))
- **license:** observe pkg types option in license scanner ([#&#8203;9091](aquasecurity/trivy#9091)) ([d44af8c](aquasecurity/trivy@d44af8c))
- **misconf:** add private ip google access attribute to subnetwork ([#&#8203;9199](aquasecurity/trivy#9199)) ([263845c](aquasecurity/trivy@263845c))
- **misconf:** added logging and versioning to the gcp storage bucket ([#&#8203;9226](aquasecurity/trivy#9226)) ([110f80e](aquasecurity/trivy@110f80e))
- **repo:** add git repository metadata to reports ([#&#8203;9252](aquasecurity/trivy#9252)) ([f4b2cf1](aquasecurity/trivy@f4b2cf1))
- **report:** add CVSS vectors in sarif report ([#&#8203;9157](aquasecurity/trivy#9157)) ([60723e6](aquasecurity/trivy@60723e6))
- **sbom:** add SHA-512 hash support for CycloneDX SBOM ([#&#8203;9126](aquasecurity/trivy#9126)) ([12d6706](aquasecurity/trivy@12d6706))

##### Bug Fixes

- **alma:** parse epochs from rpmqa file ([#&#8203;9101](aquasecurity/trivy#9101)) ([82db2fc](aquasecurity/trivy@82db2fc))
- also check `filepath` when removing duplicate packages ([#&#8203;9142](aquasecurity/trivy#9142)) ([4d10a81](aquasecurity/trivy@4d10a81))
- **aws:** update amazon linux 2 EOL date ([#&#8203;9176](aquasecurity/trivy#9176)) ([0ecfed6](aquasecurity/trivy@0ecfed6))
- **cli:** Add more non-sensitive flags to telemetry ([#&#8203;9110](aquasecurity/trivy#9110)) ([7041a39](aquasecurity/trivy@7041a39))
- **cli:** ensure correct command is picked by telemetry ([#&#8203;9260](aquasecurity/trivy#9260)) ([b4ad00f](aquasecurity/trivy@b4ad00f))
- **cli:** panic: attempt to get os.Args\[1] when len(os.Args) < 2 ([#&#8203;9206](aquasecurity/trivy#9206)) ([adfa879](aquasecurity/trivy@adfa879))
- **license:** add missed `GFDL-NIV-1.1` and `GFDL-NIV-1.2` into Trivy mapping ([#&#8203;9116](aquasecurity/trivy#9116)) ([a692f29](aquasecurity/trivy@a692f29))
- **license:** handle WITH operator for `LaxSplitLicenses` ([#&#8203;9232](aquasecurity/trivy#9232)) ([b4193d0](aquasecurity/trivy@b4193d0))
- migrate from `*.list` to `*.md5sums` files for `dpkg` ([#&#8203;9131](aquasecurity/trivy#9131)) ([f224de3](aquasecurity/trivy@f224de3))
- **misconf:** correctly adapt azure storage account ([#&#8203;9138](aquasecurity/trivy#9138)) ([51aa022](aquasecurity/trivy@51aa022))
- **misconf:** correctly parse empty port ranges in google\_compute\_firewall ([#&#8203;9237](aquasecurity/trivy#9237)) ([77bab7b](aquasecurity/trivy@77bab7b))
- **misconf:** fix log bucket in schema ([#&#8203;9235](aquasecurity/trivy#9235)) ([7ebc129](aquasecurity/trivy@7ebc129))
- **misconf:** skip rewriting expr if attr is nil ([#&#8203;9113](aquasecurity/trivy#9113)) ([42ccd3d](aquasecurity/trivy@42ccd3d))
- **nodejs:** don't use prerelease logic for compare npm constraints  ([#&#8203;9208](aquasecurity/trivy#9208)) ([fe96436](aquasecurity/trivy@fe96436))
- prevent graceful shutdown message on normal exit ([#&#8203;9244](aquasecurity/trivy#9244)) ([6095984](aquasecurity/trivy@6095984))
- **rootio:** check full version to detect `root.io` packages ([#&#8203;9117](aquasecurity/trivy#9117)) ([c2ddd44](aquasecurity/trivy@c2ddd44))
- **rootio:** fix severity selection ([#&#8203;9181](aquasecurity/trivy#9181)) ([6fafbeb](aquasecurity/trivy@6fafbeb))
- **sbom:** merge in-graph and out-of-graph OS packages in scan results ([#&#8203;9194](aquasecurity/trivy#9194)) ([aa944cc](aquasecurity/trivy@aa944cc))
- **sbom:** use correct field for licenses in CycloneDX reports ([#&#8203;9057](aquasecurity/trivy#9057)) ([143da88](aquasecurity/trivy@143da88))
- **secret:** add UTF-8 validation in secret scanner to prevent protobuf marshalling errors ([#&#8203;9253](aquasecurity/trivy#9253)) ([54832a7](aquasecurity/trivy@54832a7))
- **secret:** fix line numbers for multiple-line secrets ([#&#8203;9104](aquasecurity/trivy#9104)) ([e579746](aquasecurity/trivy@e579746))
- **server:** add HTTP transport setup to server mode ([#&#8203;9217](aquasecurity/trivy#9217)) ([1163b04](aquasecurity/trivy@1163b04))
- supporting .egg-info/METADATA in python.Packaging analyzer ([#&#8203;9151](aquasecurity/trivy#9151)) ([e306e2d](aquasecurity/trivy@e306e2d))
- **terraform:** `for_each` on a map returns a resource for every key ([#&#8203;9156](aquasecurity/trivy#9156)) ([153318f](aquasecurity/trivy@153318f))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xLjMiLCJ1cGRhdGVkSW5WZXIiOiI0MS4xLjMiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbImltYWdlIl19-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1073
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
yutatokoi pushed a commit to yutatokoi/trivy that referenced this pull request Aug 12, 2025
Co-authored-by: knqyf263 <knqyf263@users.noreply.github.com>
Co-authored-by: DmitriyLewen <91113035+DmitriyLewen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add git repository metadata to reports
3 participants