Skip to content

Conversation

knqyf263
Copy link
Collaborator

@knqyf263 knqyf263 commented Jul 28, 2025

Description

This PR adds UTF-8 validation to the secret scanner to prevent protobuf marshalling errors when scanning files with invalid UTF-8 content, such as translation files.

The issue occurred when scanning repositories like juice-shop that contain files with invalid UTF-8 sequences. The secret scanner would fail with 'string field contains invalid UTF-8' when marshalling results to protobuf format.

$ trivy repo https://github.com/juice-shop/juice-shop --server http://localhost:4954
...
2025-07-28T10:13:33+04:00       FATAL   Fatal error     repo scan error: scan error: scan failed: failed analysis: remote repository error: failed to store blob (sha256:e3e730cdeff2f1c66082bc550dcdd2226b8070382f6451040a13d203e1ff5325) in cache: unable to store cache on the server: twirp error internal: failed to marshal proto request: string field contains invalid UTF-8

Changes

Added UTF-8 validation using strings.ToValidUTF8() in secret scanner

Related issues

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change)

knqyf263 added 2 commits July 28, 2025 07:15
…alling errors

When scanning files with invalid UTF-8 content (such as translation files
or binary content), the secret scanner would fail with 'string field contains
invalid UTF-8' when marshalling results to protobuf format.

This commit adds UTF-8 validation before converting file bytes to strings:
- Added sanitizeUTF8String() helper function with warning logs
- Updated findLocation() to validate UTF-8 in match lines and code content
- Modified function signatures to pass logger context through call chain

Fixes the 'twirp marshal protobuf got error UTF-8' issue when scanning
repositories like juice-shop that contain files with invalid UTF-8 sequences.
- Convert sanitizeUTF8String to Scanner method using member logger
- Implement sync.OnceFunc to limit UTF-8 warnings to once per scan
- Remove logger parameter passing through call chain
- Maintain [secret] prefix in warning logs
@knqyf263 knqyf263 self-assigned this Jul 28, 2025
knqyf263 added 2 commits July 28, 2025 12:36
- Move warnUTF8Once to package level to simplify Scanner struct
- Replace hardcoded "secret" string with log.PrefixSecret constant
- Convert toFinding and findLocation from methods to functions
- Remove unused context parameter from sanitizeUTF8String
- Move sanitizeUTF8String function to end of file for better organization

This ensures proper handling of invalid UTF-8 sequences in scanned content
while maintaining a cleaner code structure.
- Add test case to verify invalid UTF-8 sequences are properly sanitized
- Create test data file with invalid UTF-8 sequences in GitHub PAT
- Ensure secrets are still detected even with invalid UTF-8 in the content
@knqyf263 knqyf263 added the autoready Automatically mark PR as ready for review when all checks pass label Jul 28, 2025
- Align variable declarations in var block
- Remove trailing whitespace
@github-actions github-actions bot marked this pull request as ready for review July 28, 2025 10:01
@github-actions github-actions bot removed the autoready Automatically mark PR as ready for review when all checks pass label Jul 28, 2025
@github-actions github-actions bot requested a review from DmitriyLewen as a code owner July 28, 2025 10:01
@knqyf263 knqyf263 changed the title fix: add UTF-8 validation in secret scanner to prevent protobuf marshalling errors fix(secret): add UTF-8 validation in secret scanner to prevent protobuf marshalling errors Jul 28, 2025
…scanner

Replace invalid UTF-8 sequences with the standard Unicode replacement
character (U+FFFD) using utf8.RuneError instead of empty string.
This provides better visibility of where invalid sequences occurred
in the output while following Go conventions.
@knqyf263 knqyf263 enabled auto-merge July 28, 2025 14:08
@knqyf263 knqyf263 added this pull request to the merge queue Jul 28, 2025
Merged via the queue into aquasecurity:main with commit 54832a7 Jul 28, 2025
13 checks passed
@knqyf263 knqyf263 deleted the fix/secret-utf8-validation branch July 28, 2025 14:43
alexlebens pushed a commit to alexlebens/infrastructure that referenced this pull request Jul 31, 2025
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [mirror.gcr.io/aquasec/trivy](https://www.aquasec.com/products/trivy/) ([source](https://github.com/aquasecurity/trivy)) | minor | `0.64.1` -> `0.65.0` |

---

### Release Notes

<details>
<summary>aquasecurity/trivy (mirror.gcr.io/aquasec/trivy)</summary>

### [`v0.65.0`](https://github.com/aquasecurity/trivy/blob/HEAD/CHANGELOG.md#0650-2025-07-30)

[Compare Source](aquasecurity/trivy@v0.64.1...v0.65.0)

##### Features

- add graceful shutdown with signal handling ([#&#8203;9242](aquasecurity/trivy#9242)) ([2c05882](aquasecurity/trivy@2c05882))
- add HTTP request/response tracing support ([#&#8203;9125](aquasecurity/trivy#9125)) ([aa5b32a](aquasecurity/trivy@aa5b32a))
- **alma:** add AlmaLinux 10 support ([#&#8203;9207](aquasecurity/trivy#9207)) ([861d51e](aquasecurity/trivy@861d51e))
- **flag:** add schema validation for `--server` flag ([#&#8203;9270](aquasecurity/trivy#9270)) ([ed4640e](aquasecurity/trivy@ed4640e))
- **image:** add Docker context resolution ([#&#8203;9166](aquasecurity/trivy#9166)) ([99cd4e7](aquasecurity/trivy@99cd4e7))
- **license:** observe pkg types option in license scanner ([#&#8203;9091](aquasecurity/trivy#9091)) ([d44af8c](aquasecurity/trivy@d44af8c))
- **misconf:** add private ip google access attribute to subnetwork ([#&#8203;9199](aquasecurity/trivy#9199)) ([263845c](aquasecurity/trivy@263845c))
- **misconf:** added logging and versioning to the gcp storage bucket ([#&#8203;9226](aquasecurity/trivy#9226)) ([110f80e](aquasecurity/trivy@110f80e))
- **repo:** add git repository metadata to reports ([#&#8203;9252](aquasecurity/trivy#9252)) ([f4b2cf1](aquasecurity/trivy@f4b2cf1))
- **report:** add CVSS vectors in sarif report ([#&#8203;9157](aquasecurity/trivy#9157)) ([60723e6](aquasecurity/trivy@60723e6))
- **sbom:** add SHA-512 hash support for CycloneDX SBOM ([#&#8203;9126](aquasecurity/trivy#9126)) ([12d6706](aquasecurity/trivy@12d6706))

##### Bug Fixes

- **alma:** parse epochs from rpmqa file ([#&#8203;9101](aquasecurity/trivy#9101)) ([82db2fc](aquasecurity/trivy@82db2fc))
- also check `filepath` when removing duplicate packages ([#&#8203;9142](aquasecurity/trivy#9142)) ([4d10a81](aquasecurity/trivy@4d10a81))
- **aws:** update amazon linux 2 EOL date ([#&#8203;9176](aquasecurity/trivy#9176)) ([0ecfed6](aquasecurity/trivy@0ecfed6))
- **cli:** Add more non-sensitive flags to telemetry ([#&#8203;9110](aquasecurity/trivy#9110)) ([7041a39](aquasecurity/trivy@7041a39))
- **cli:** ensure correct command is picked by telemetry ([#&#8203;9260](aquasecurity/trivy#9260)) ([b4ad00f](aquasecurity/trivy@b4ad00f))
- **cli:** panic: attempt to get os.Args\[1] when len(os.Args) < 2 ([#&#8203;9206](aquasecurity/trivy#9206)) ([adfa879](aquasecurity/trivy@adfa879))
- **license:** add missed `GFDL-NIV-1.1` and `GFDL-NIV-1.2` into Trivy mapping ([#&#8203;9116](aquasecurity/trivy#9116)) ([a692f29](aquasecurity/trivy@a692f29))
- **license:** handle WITH operator for `LaxSplitLicenses` ([#&#8203;9232](aquasecurity/trivy#9232)) ([b4193d0](aquasecurity/trivy@b4193d0))
- migrate from `*.list` to `*.md5sums` files for `dpkg` ([#&#8203;9131](aquasecurity/trivy#9131)) ([f224de3](aquasecurity/trivy@f224de3))
- **misconf:** correctly adapt azure storage account ([#&#8203;9138](aquasecurity/trivy#9138)) ([51aa022](aquasecurity/trivy@51aa022))
- **misconf:** correctly parse empty port ranges in google\_compute\_firewall ([#&#8203;9237](aquasecurity/trivy#9237)) ([77bab7b](aquasecurity/trivy@77bab7b))
- **misconf:** fix log bucket in schema ([#&#8203;9235](aquasecurity/trivy#9235)) ([7ebc129](aquasecurity/trivy@7ebc129))
- **misconf:** skip rewriting expr if attr is nil ([#&#8203;9113](aquasecurity/trivy#9113)) ([42ccd3d](aquasecurity/trivy@42ccd3d))
- **nodejs:** don't use prerelease logic for compare npm constraints  ([#&#8203;9208](aquasecurity/trivy#9208)) ([fe96436](aquasecurity/trivy@fe96436))
- prevent graceful shutdown message on normal exit ([#&#8203;9244](aquasecurity/trivy#9244)) ([6095984](aquasecurity/trivy@6095984))
- **rootio:** check full version to detect `root.io` packages ([#&#8203;9117](aquasecurity/trivy#9117)) ([c2ddd44](aquasecurity/trivy@c2ddd44))
- **rootio:** fix severity selection ([#&#8203;9181](aquasecurity/trivy#9181)) ([6fafbeb](aquasecurity/trivy@6fafbeb))
- **sbom:** merge in-graph and out-of-graph OS packages in scan results ([#&#8203;9194](aquasecurity/trivy#9194)) ([aa944cc](aquasecurity/trivy@aa944cc))
- **sbom:** use correct field for licenses in CycloneDX reports ([#&#8203;9057](aquasecurity/trivy#9057)) ([143da88](aquasecurity/trivy@143da88))
- **secret:** add UTF-8 validation in secret scanner to prevent protobuf marshalling errors ([#&#8203;9253](aquasecurity/trivy#9253)) ([54832a7](aquasecurity/trivy@54832a7))
- **secret:** fix line numbers for multiple-line secrets ([#&#8203;9104](aquasecurity/trivy#9104)) ([e579746](aquasecurity/trivy@e579746))
- **server:** add HTTP transport setup to server mode ([#&#8203;9217](aquasecurity/trivy#9217)) ([1163b04](aquasecurity/trivy@1163b04))
- supporting .egg-info/METADATA in python.Packaging analyzer ([#&#8203;9151](aquasecurity/trivy#9151)) ([e306e2d](aquasecurity/trivy@e306e2d))
- **terraform:** `for_each` on a map returns a resource for every key ([#&#8203;9156](aquasecurity/trivy#9156)) ([153318f](aquasecurity/trivy@153318f))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xLjMiLCJ1cGRhdGVkSW5WZXIiOiI0MS4xLjMiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbImltYWdlIl19-->

Reviewed-on: https://gitea.alexlebens.dev/alexlebens/infrastructure/pulls/1073
Co-authored-by: Renovate Bot <renovate-bot@alexlebens.net>
Co-committed-by: Renovate Bot <renovate-bot@alexlebens.net>
yutatokoi pushed a commit to yutatokoi/trivy that referenced this pull request Aug 12, 2025
…uf marshalling errors (aquasecurity#9253)

Co-authored-by: knqyf263 <knqyf263@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Secret scanner fails with UTF-8 marshalling error when scanning files with invalid UTF-8 content
3 participants