Skip to content

Define an order of precedence + provide a means to override it (REUSE.yaml) #779

@carmenbianca

Description

@carmenbianca

Help, I got here because I got a PendingDeprecationWarning

PendingDeprecationWarning: Copyright and licensing information for 'my-project/foo.py' has been found in both 'my-project/foo.py' and in the DEP5 file located at '.reuse/dep5'. The information for these two sources has been aggregated. In the future this behaviour will change, and you will need to explicitly enable aggregation. See #779. You need do nothing yet. Run with --suppress-deprecation to hide this warning.

You can get rid of this warning by upgrading to >=4.0.0 of reuse, where the above behaviour is defined in REUSE Specification v3.2.

The reason you're getting this warning is because of the following scenario. You have a file my-project/foo.py which contains the following header:

# SPDX-FileCopyrightText: 2023 Jane Doe
#
# SPDX-License-Identifier: GPL-3.0-or-later

But you also have a .reuse/dep5 file which contains the following section:

Files: my-project/*.py
Copyright: 2020 Example NGO
License: 0BSD

The problem: Under which licence is the file? Who are the copyright holders?

Prior to version 4.0.0, we erred on the side of caution, and just aggregated the results. The answer to both questions was 'both', as far as the tool was concerned.

However, that behaviour was not actually specified in the REUSE Specification v3.0, and there was a consensus among the maintainers of REUSE that this behaviour wasn't great. So we wanted to change it.

In REUSE Specification v3.2, we added a new file format REUSE.toml which allows you to specify the order of precedence of licensing information. The method of aggregation described above is now explicitly defined as the order of precedence for .reuse/dep5.


Find below the historical contents of this issue.

A naïve proposal + some history

We want to define an order of precedence for copyright and licensing information. Here is a concrete proposal:

Copyright and Licensing Information is considered according to the
following order of precedence:

  1. Information defined in the .license file.
  2. Information defined in the Commentable File.
  3. Information defined in .reuse/dep5.

There is no merging of information from different sources. Only the
source with the highest precedence is considered.

In fact, this proposal is so concrete that—for a few hours—it was in REUSE Specification 3.1 and tool version 2.0.0! However, because of quick negative feedback, this update to the specification was promptly reverted, and tool version 2.0.0 was yanked. A little embarrassing on our part, but we're thankful for the constructive feedback.

Copied from the change log:

While the intention of the breaking change was sound (don't mix information sources; define a single source of truth), there were legitimate use-cases that were broken as a result of this.

The legitimate use-case is the following scenario: You copy a project Foo wholesale into your own project as a static dependency. Foo is not REUSE-compliant, but does contain copyright statements in some code headers. You write a section into .reuse/dep5 broadly declaring that static/Foo/* is under its declared licence, and attribute The Foo Authors as the copyright holders. However, because the DEP5 file is now no longer applied to the files that contain copyright statements, REUSE will complain that these files do not have a declared licence.

Within the restrictions of the above proposal, there is no good workable solution to this use-case. You could manually edit the headers (not great, especially when Foo is big, or you regularly need to update it), or you could manually add .license files, which may be a huge task.

An actual but not-so-concrete proposal

We still want to define an order of precedence. But we must provide a way to force aggregation (current behaviour) or hard-coding precedence (e.g., prefer .reuse/dep5 over the file contents).

There does not yet exist a concrete way of doing this, but you may think of it like this. Given the example .reuse/dep5 section at the start of this issue, we could instead write this:

Files: my-project/*.py
Copyright: 2020 Example NGO
License: 0BSD
Precedence: [file (default)|aggregate|dep5]

The problem, however, is that DEP5 does not support this field, and we don't want to make it support this field.

So we want to pivot away from DEP5 and adopt a different configuration method. We've been brainstorming this since 2021 (volunteer projects aren't very fast), and we're internally referring to it as REUSE.yaml (although the YAML part is a bit up in the air).

An actual for-real-this-time concrete proposal

Find below a real and actual concrete proposal:

# The version of the TOML schema. A simple integer should be fine.
# Mandatory.
version = 1

[[annotations]]
# The path (or paths) that are covered, relative to REUSE.toml's directory. Mandatory.
path = [
     ".bumpversion.cfg",
     "setup.cfg"
]
# A string that defines the precedence of copyright and licensing information.
# The choices are:
#
# - "override" -> Treat the information in this file as the ultimate authority of
#   the copyright and licensing of the covered files. If multiple nested
#   REUSE.tomls have this precedence for the same file, then the topmost REUSE.toml
#   is authoritative.
# - "closest" -> Use the information closest to the file (including inside the file)
#   if available. If no such information is found, then the information inside this
#   REUSE.toml is applied to that file. TODO: what if there is only partial information
#   inside the file?
# - "aggregate" -> Aggregate the information from this file with the information
#   inside of the covered files.
#
# Not mandatory. Defaults to "closest".
precedence = "override"
# The copyright notice (or notices) that apply to the above paths. Mandatory.
SPDX-FileCopyrightText = "2017 Free Software Foundation Europe e.V. <https://fsfe.org>"
# The license expression (or expressions) that apply to the above paths.
# Mandatory.
SPDX-License-Identifier = "GPL-3.0-or-later"

# Subsequent tables override previous tables. This does NOT interact with the
# 'precedence' key.
[[annotations]]
# Can contain globs.
path = "docs/reuse*.rst"
precedence = "override"
SPDX-FileCopyrightText = [
    "2017 Free Software Foundation Europe e.V. <https://fsfe.org>",
    "2023 Jane Doe",
]
# These SHOULD be joined with AND, but files support multiple separate
# SPDX-License-Identifier tags, so let's support it here as well.
SPDX-License-Identifier = [
    "CC-BY-SA-4.0",
    "GPL-3.0-or-later"
]

Some notes on the implementation:

  • I chose TOML. I had previously been partial to YAML, but reading over the discussions in the linked issues, and having worked a little more with TOML recently, it's a lot more fool-proof to write, especially as concerns indentation. It's not very good for nesting of data, but we're not doing that, so it's fine. We could bikeshed this choice further, but I propose that we just go with it.
  • path, SPDX-FileCopyrightText, and SPDX-License-Identifier can be either a single string or a list of strings. This (partially) matches DEP5 behaviour, making it easy to port. It's also convenient to not mandate lists; we'll probably convert string values into single-value lists in the under-the-hood implementation (edit: that is exactly what I did).
  • I chose to use the full SPDX-[...] key names. This works better in TOML than in YAML (because the semicolon in YAML messes with this tool's parsing). It's a bit more annoying to type, but it's also very consistent, and means that the user has to memorise less.
  • The version key doesn't do much of anything precently. I'm not sure if it'll ever become important, but if it does, it'll be good to have.

Some notes about the file itself:

  • I intend to support exclusively REUSE.toml, and NOT .REUSE.toml. People will be peeved by this choice (they don't like random tools littering their clean workspace), but I propose that we stand by this choice. By allowing dotfiles, we would run the risk that the licensing information is hidden on some computers. Licensing information should not be hidden, ergo let's not do dotfiles.
  • REUSE.toml files can be nested! You can place them anywhere in the project, not just in the root directory. They use the same closest/aggregate/override precedence system. closest resolves to the file itself OR to the nearest REUSE.toml (can be self). aggregate just aggregates the REUSE.toml's information always, and then behaves like closest. override is an aggressive "ignore everything underneath me; I am the ultimate authority here" precedence setting. The topmost REUSE.toml with override is authoritative.

Related issues

Here are some issues of relevance (in order of relevance, feel free to reference more):

This issue will exist as a sort of meta issue to refer back to and track work in other issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions