Skip to content

Conversation

spiffcs
Copy link
Contributor

@spiffcs spiffcs commented May 9, 2025

This PR makes the following changes:

  • detects when license values are really full text, attempts to identify the license ID, and migrate the contents to the pkg.License.Content field and fills in the ID to pkg.License.Value
  • adds a new license.content configuration, deprecating the existing license.include-unknown-license-content configuration. The configuration allows for the following values:
    • unknown (default): only include contents for licenses that cannot be identified
    • none: never include license contents
    • all: include license contents wherever possible
  • replaces the license.license-coverage with license.coverage
# .syft.yaml configuration summary
license:
  # include the content of licenses in the SBOM for a given syft scan; valid values are: [all unknown none] (env: SYFT_LICENSE_CONTENT)
  content: 'none'

  # deprecated: please use 'license-content' instead (env: SYFT_LICENSE_INCLUDE_UNKNOWN_LICENSE_CONTENT)
  include-unknown-license-content:

  # adjust the percent as a fraction of the total text, in normalized words, that
  # matches any valid license for the given inputs, expressed as a percentage across all of the licenses matched. (env: SYFT_LICENSE_COVERAGE)
  coverage: 75

  # deprecated: please use 'coverage' instead (env: SYFT_LICENSE_LICENSE_COVERAGE)
  license-coverage:

Development details

The current license constructors have been deprecated. The new constructors are copies of the old ones but WithContext and accept context.Context as their new initial argument.

By refactoring these constructors we can now access the License Scanner during license construction. This allows all catalogers the have a file.ReadCloser to query the license scanner during license construction.

This enables #3088 to be solved.

The new builder (under the ctx license constructors) has better logic surrounding how to detect if a metadata value is a license ID, some custom license title string, or actually the full contents itself. By detecting this at construction and running the scanner against metadata values that are likely license contents we can prevent license.Value from being populated with the full contents of the license text.

All catalogers now return the full content of the licenses discovered. These contents are dropped from the licenses in a post processing step.

The user is given the option to enable if they want the contents of licenses to appear in their SBOM.

None
Unknown
All

SPDX OtherLicenses

Given some of the changes made regarding how the values and IDs are set during license construction in this PR, the SPDX OtherLicenses format code has been moved to be done with SPDX package creation in syft's format modules.

When SPDX packages are having their concluded and declared fields decorated, the spdx format code is now using the available license information to maintain a set of spdx.OtherLicense that will be returned with the packages.

This is different from our previous approach where spdx.OtherLicense would be recomputed after packages had already been assembled. This new approach is easier to test given that the licenses under packages are compared with spdx.OtherLicense when external validator tools are run against spdx formatted SBOM.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Documentation (updates the documentation)

Checklist

  • I have added unit tests that cover changed behavior
  • I have tested my code in common scenarios and confirmed there are no regressions
  • I have added comments to my code, particularly in hard-to-understand sections

spiffcs added 3 commits May 9, 2025 13:36
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs spiffcs changed the title 3088 new license ctx scaner 3088 - update license constructors to use scanner through ctx and pass values to content with shasum May 9, 2025
spiffcs added 4 commits May 9, 2025 17:12
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs spiffcs changed the title 3088 - update license constructors to use scanner through ctx and pass values to content with shasum 3088 - update license constructors to use scanner through ctx; configure syft to drop content post scan May 12, 2025
spiffcs added 2 commits May 12, 2025 12:44
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs spiffcs changed the title 3088 - update license constructors to use scanner through ctx; configure syft to drop content post scan 3088 - update license constructors to use license scanner via ctx; configure syft to drop license contents post scan May 12, 2025
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs spiffcs force-pushed the 3088-new-license-ctx-scaner branch from f70743a to d569180 Compare May 12, 2025 21:29
spiffcs added 4 commits May 12, 2025 19:03
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
* main:
  Translate Portage license strings to SPDX expressions (#1763)
  fix: stop emitting redis redis CPE for PHP PECL redis (#3881)
  feat: Add PURL list input/output format (#3853)
  chore(deps): update CPE dictionary index (#3877)
  chore(deps): update tools to latest versions (#3878)
  do not search binary contents for version for go package (#3874)
  fix: remove race when writing errors in generic cataloger (#3875)
  clear devel version for go packages (#3873)

Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs spiffcs marked this pull request as ready for review May 13, 2025 03:52
wagoodman and others added 5 commits May 13, 2025 10:56
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
@spiffcs spiffcs force-pushed the 3088-new-license-ctx-scaner branch from 2604d47 to e5f45f0 Compare May 13, 2025 17:10
wagoodman and others added 2 commits May 13, 2025 13:21
…scaner

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Christopher Phillips <32073428+spiffcs@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
@wagoodman wagoodman changed the title 3088 - update license constructors to use license scanner via ctx; configure syft to drop license contents post scan Detect license ID from full text when incidentally provided as a value May 13, 2025
wagoodman added 4 commits May 13, 2025 14:49
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
@spiffcs spiffcs merged commit f77d503 into main May 13, 2025
13 checks passed
@spiffcs spiffcs deleted the 3088-new-license-ctx-scaner branch May 13, 2025 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect whether full license text or a license name has been provided
2 participants