-
Notifications
You must be signed in to change notification settings - Fork 165
Description
In REUSE spec version 3.1, we want to introduce support for in-line SPDX snippets (fsfe/reuse-docs#107). However, the tool does not support this yet.
Currently, the tool misses two features:
- It does not detect the
SPDX-SnippetCopyrightText
. - The tool only reads the first 4 KB of a file (IIRC). As quite a few files tend to be larger than that, the tool might miss snippets further down the file.
Check out this file that presents both problems in two snippets with ~4KB text in between.
- The "global" copyright and license is detected.
- From the first snippet by John (line 14-20), only the MIT license is detected, no copyright.
- From the second snippet by Hacker (line 24-30), nothing is detected.
# SUMMARY
* Bad licenses:
* Deprecated licenses:
* Licenses without file extension:
* Missing licenses: MIT
* Unused licenses:
* Used licenses: CC-BY-4.0, CC0-1.0, GPL-3.0-or-later, MIT
* Read errors: 0
* Files with copyright information: 6 / 6
* Files with license information: 6 / 6
FileName: ./src/main.c
SPDXID: SPDXRef-63cc816bd7ec90aac6513f4273306341
FileChecksum: SHA1: a3ae3b0459e15ade0a6ecf992eef1ab222f508c8
LicenseConcluded: NOASSERTION
LicenseInfoInFile: GPL-3.0-or-later
LicenseInfoInFile: MIT
FileCopyrightText: <text>SPDX-FileCopyrightText: 2019 Jane Doe <jane@example.com></text>
Possible solutions
Problem 1 shouldn't be hard to solve. As REUSE looks at the file level, we could treat the SPDX-SnippetCopyrightText
tag the same as SPDX-FileCopyrightText
so that it will appear in reuse spdx
and, hopefully soon, reuse lint --json
for instance.
Problem 2 is harder. We decided to only consider the first few kilobytes of each file as we assume that the interesting content is in the comment header which usually is not longer. But with snippets, interesting data might be at the very end of a very long file as well. So actually we would have to read the whole file.
However, there might be serious performance implications. I haven't run any tests with real repositories yet. We could make reading the whole file optional, either disabled or enabled by default.