-
Notifications
You must be signed in to change notification settings - Fork 164
Description
Help, I got here because I got a PendingDeprecationWarning
PendingDeprecationWarning: Copyright and licensing information for 'my-project/foo.py' has been found in both 'my-project/foo.py' and in the DEP5 file located at '.reuse/dep5'. The information for these two sources has been aggregated. In the future this behaviour will change, and you will need to explicitly enable aggregation. See #779. You need do nothing yet. Run with
--suppress-deprecation
to hide this warning.
You can get rid of this warning by upgrading to >=4.0.0
of reuse
, where the above behaviour is defined in REUSE Specification v3.2.
The reason you're getting this warning is because of the following scenario. You have a file my-project/foo.py
which contains the following header:
# SPDX-FileCopyrightText: 2023 Jane Doe
#
# SPDX-License-Identifier: GPL-3.0-or-later
But you also have a .reuse/dep5
file which contains the following section:
Files: my-project/*.py
Copyright: 2020 Example NGO
License: 0BSD
The problem: Under which licence is the file? Who are the copyright holders?
Prior to version 4.0.0, we erred on the side of caution, and just aggregated the results. The answer to both questions was 'both', as far as the tool was concerned.
However, that behaviour was not actually specified in the REUSE Specification v3.0, and there was a consensus among the maintainers of REUSE that this behaviour wasn't great. So we wanted to change it.
In REUSE Specification v3.2, we added a new file format REUSE.toml
which allows you to specify the order of precedence of licensing information. The method of aggregation described above is now explicitly defined as the order of precedence for .reuse/dep5
.
Find below the historical contents of this issue.
A naïve proposal + some history
We want to define an order of precedence for copyright and licensing information. Here is a concrete proposal:
Copyright and Licensing Information is considered according to the
following order of precedence:
- Information defined in the
.license
file.- Information defined in the Commentable File.
- Information defined in
.reuse/dep5
.There is no merging of information from different sources. Only the
source with the highest precedence is considered.
In fact, this proposal is so concrete that—for a few hours—it was in REUSE Specification 3.1 and tool version 2.0.0! However, because of quick negative feedback, this update to the specification was promptly reverted, and tool version 2.0.0 was yanked. A little embarrassing on our part, but we're thankful for the constructive feedback.
Copied from the change log:
While the intention of the breaking change was sound (don't mix information sources; define a single source of truth), there were legitimate use-cases that were broken as a result of this.
The legitimate use-case is the following scenario: You copy a project Foo wholesale into your own project as a static dependency. Foo is not REUSE-compliant, but does contain copyright statements in some code headers. You write a section into .reuse/dep5
broadly declaring that static/Foo/*
is under its declared licence, and attribute The Foo Authors
as the copyright holders. However, because the DEP5 file is now no longer applied to the files that contain copyright statements, REUSE will complain that these files do not have a declared licence.
Within the restrictions of the above proposal, there is no good workable solution to this use-case. You could manually edit the headers (not great, especially when Foo is big, or you regularly need to update it), or you could manually add .license
files, which may be a huge task.
An actual but not-so-concrete proposal
We still want to define an order of precedence. But we must provide a way to force aggregation (current behaviour) or hard-coding precedence (e.g., prefer .reuse/dep5
over the file contents).
There does not yet exist a concrete way of doing this, but you may think of it like this. Given the example .reuse/dep5
section at the start of this issue, we could instead write this:
Files: my-project/*.py
Copyright: 2020 Example NGO
License: 0BSD
Precedence: [file (default)|aggregate|dep5]
The problem, however, is that DEP5 does not support this field, and we don't want to make it support this field.
So we want to pivot away from DEP5 and adopt a different configuration method. We've been brainstorming this since 2021 (volunteer projects aren't very fast), and we're internally referring to it as REUSE.yaml
(although the YAML part is a bit up in the air).
An actual for-real-this-time concrete proposal
Find below a real and actual concrete proposal:
# The version of the TOML schema. A simple integer should be fine.
# Mandatory.
version = 1
[[annotations]]
# The path (or paths) that are covered, relative to REUSE.toml's directory. Mandatory.
path = [
".bumpversion.cfg",
"setup.cfg"
]
# A string that defines the precedence of copyright and licensing information.
# The choices are:
#
# - "override" -> Treat the information in this file as the ultimate authority of
# the copyright and licensing of the covered files. If multiple nested
# REUSE.tomls have this precedence for the same file, then the topmost REUSE.toml
# is authoritative.
# - "closest" -> Use the information closest to the file (including inside the file)
# if available. If no such information is found, then the information inside this
# REUSE.toml is applied to that file. TODO: what if there is only partial information
# inside the file?
# - "aggregate" -> Aggregate the information from this file with the information
# inside of the covered files.
#
# Not mandatory. Defaults to "closest".
precedence = "override"
# The copyright notice (or notices) that apply to the above paths. Mandatory.
SPDX-FileCopyrightText = "2017 Free Software Foundation Europe e.V. <https://fsfe.org>"
# The license expression (or expressions) that apply to the above paths.
# Mandatory.
SPDX-License-Identifier = "GPL-3.0-or-later"
# Subsequent tables override previous tables. This does NOT interact with the
# 'precedence' key.
[[annotations]]
# Can contain globs.
path = "docs/reuse*.rst"
precedence = "override"
SPDX-FileCopyrightText = [
"2017 Free Software Foundation Europe e.V. <https://fsfe.org>",
"2023 Jane Doe",
]
# These SHOULD be joined with AND, but files support multiple separate
# SPDX-License-Identifier tags, so let's support it here as well.
SPDX-License-Identifier = [
"CC-BY-SA-4.0",
"GPL-3.0-or-later"
]
Some notes on the implementation:
- I chose TOML. I had previously been partial to YAML, but reading over the discussions in the linked issues, and having worked a little more with TOML recently, it's a lot more fool-proof to write, especially as concerns indentation. It's not very good for nesting of data, but we're not doing that, so it's fine. We could bikeshed this choice further, but I propose that we just go with it.
path
,SPDX-FileCopyrightText
, andSPDX-License-Identifier
can be either a single string or a list of strings. This (partially) matches DEP5 behaviour, making it easy to port. It's also convenient to not mandate lists; we'll probably convert string values into single-value lists in the under-the-hood implementation (edit: that is exactly what I did).- I chose to use the full
SPDX-[...]
key names. This works better in TOML than in YAML (because the semicolon in YAML messes with this tool's parsing). It's a bit more annoying to type, but it's also very consistent, and means that the user has to memorise less. - The
version
key doesn't do much of anything precently. I'm not sure if it'll ever become important, but if it does, it'll be good to have.
Some notes about the file itself:
- I intend to support exclusively
REUSE.toml
, and NOT.REUSE.toml
. People will be peeved by this choice (they don't like random tools littering their clean workspace), but I propose that we stand by this choice. By allowing dotfiles, we would run the risk that the licensing information is hidden on some computers. Licensing information should not be hidden, ergo let's not do dotfiles. - REUSE.toml files can be nested! You can place them anywhere in the project, not just in the root directory. They use the same
closest
/aggregate
/override
precedence system.closest
resolves to the file itself OR to the nearest REUSE.toml (can be self).aggregate
just aggregates the REUSE.toml's information always, and then behaves likeclosest
.override
is an aggressive "ignore everything underneath me; I am the ultimate authority here" precedence setting. The topmost REUSE.toml withoverride
is authoritative.
Related issues
Here are some issues of relevance (in order of relevance, feel free to reference more):
- Define syntax and format of REUSE.yaml reuse-docs#81
- Define precedence of information with REUSE.yaml reuse-docs#70
- Document 2.0.0 as yanked #775
- .license-file and/or in-file information should take precedence of dep5 #572
- Declare subprojects and support them #661
This issue will exist as a sort of meta issue to refer back to and track work in other issues.