Skip to content

feat: dpkg license improvement for non SPDX licenses #3090

@spiffcs

Description

@spiffcs

What happened:
Sometimes syft can encounter a dpkg license where the regular expression used to match on contents cannot correctly identify the license.

In the following example we should find things like:

NVIDIA Software License Agreement and CUDA Supplement to Software License Agreement

Reads contents of copyright:
func fetchCopyrightContents(resolver file.Resolver, dbLocation file.Location, m pkg.DpkgDBEntry) (io.ReadCloser, *file.Location) {
if resolver == nil {
return nil, nil
}
// look for /usr/share/docs/NAME/copyright files
copyrightPath := path.Join(docsPath, m.Package, "copyright")
location := resolver.RelativeFileByPath(dbLocation, copyrightPath)
// we may not have a copyright file for each package, ignore missing files
if location == nil {
return nil, nil
}
reader, err := resolver.FileContentsByLocation(*location)
if err != nil {
log.Warnf("failed to fetch deb copyright contents (package=%s): %s", m.Package, err)
}
defer internal.CloseAndLogError(reader, location.RealPath)
l := location.WithAnnotation(pkg.EvidenceAnnotationKey, pkg.SupportingEvidenceAnnotation)
return reader, &l
}

Sends contents for parsing

licenseStrs := parseLicensesFromCopyright(copyrightReader)
for _, licenseStr := range licenseStrs {
p.Licenses.Add(pkg.NewLicenseFromLocations(licenseStr, copyrightLocation.WithoutAnnotations()))
}
// keep a record of the file where this was discovered
p.Locations.Add(*copyrightLocation)

Searches for license clause

func parseLicensesFromCopyright(reader io.Reader) []string {
findings := strset.New()
scanner := bufio.NewScanner(reader)
for scanner.Scan() {
line := scanner.Text()
if value := findLicenseClause(licensePattern, "license", line); value != "" {
findings.Add(value)
}
if value := findLicenseClause(commonLicensePathPattern, "license", line); value != "" {
findings.Add(value)
}
}
results := findings.List()
sort.Strings(results)
return results
}

What you expected to happen:
Given a copyright file is found SOME license information should be created for a given package. No licenses is a bug.

Steps to reproduce the issue:

syft -o json nvidia/cuda:12.5.1-cudnn-runtime-ubuntu20.04 | grant list -o json | jq -r '.results[]
 | [.license.license_id, .license.name] | @csv' | sed 's/"//g'
  • Output of syft version: devel (tip of main)
  • OS (e.g: cat /etc/os-release or similar): OSX

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions