Use legally-correct copyright notices. #1661

Travis-Snoozy · 2015-10-20T05:50:36Z

The existing copyright support uses the (legally-unrecognized) (C), and does not support year lists due to the literal-matching nature of copyrightText.

This change fixes the above problems by adding a new {copyright} variable for use with copyrightText. The default value for copyrightText is also updated to use {copyright} instead of the hard-coded "Copyright (C)" text.

{copyright} provides the following features:

When suggesting a fix, automatically expands to "Copyright © " followed by the current year.
When checking copyright headers, {copyright} matches "Copyright © " followed by a comma-separated list of four-digit years, and/or year-ranges (two four-digit years separated with a hyphen).

This change adds new unit tests to verify this new functionality, and updates the existing tests to work correctly with the new behaviors.

The existing copyright support uses the (legally-unrecognized) (C), and does not support year lists due to the literal-matching nature of copyrightText. This change fixes the above problems by adding a new {copyright} variable for use with copyrightText. The default value for copyrightText is also updated to use {copyright} instead of the hard-coded "Copyright (C)" text. {copyright} provides the following features: - When suggesting a fix, automatically expands to "Copyright © " followed by the current year. - When checking copyright headers, {copyright} matches "Copyright © " followed by a comma-separated list of four-digit years, and/or year-ranges (two four-digit years separated with a hyphen). This change adds new unit tests to verify this new functionality, and updates the existing tests to work correctly with the new behaviors.

codecov-io · 2015-10-20T05:50:50Z

Current coverage is `79.14%`

Merging #1661 into master will decrease coverage by -0.02% as of 2a54be2

@@            master   #1661   diff @@
======================================
  Files          539     539       
  Stmts        31799   31815    +16
  Branches      8928    8934     +6
  Methods                          
======================================
+ Hit          25175   25181     +6
- Partial       5178    5188    +10
  Missed        1446    1446

Review entire Coverage Diff as of 2a54be2

Powered by Codecov. Updated on successful CI builds.

sharwell · 2015-10-20T14:51:59Z

sharwell · 2015-10-20T14:59:37Z

📝 Note that the default header's use of Copyright (c) complies with the recommended form of notice described under Visually Perceptible Copies in Copyright Notice by the US Copyright Office. It is the word 'Copyright' as opposed to the symbol '(c)' which meets this condition. The year of first publication is intentionally omitted from the default notice; additional discussion on this can be found in File headers and Copyright statements.

sharwell · 2015-10-20T15:24:19Z

I've marked this as needs discussion. The primary new feature added by this pull request is support for year lists. This is related to #1357 (#1357 appears to be a subset of this proposal).

The main areas of concern I have with this proposal are the following:

The US Copyright Office does not suggest the use of or prescribe a form for copyright notices listing multiple years. This means the form used for multiple years is established on a per-company or per-legal-team basis, and thus only applies to a subset of our potential users which are still using headers with year(s) listed.
The copyright years cannot be validated for correctness by StyleCop Analyzers. Maintenance of the proposed form imposes the maximum burden for development teams, and I would prefer to discourage its use as much as possible, in as many places as possible. My hope is that as more projects adopt notices which intentionally omit the copyright year(s), the entire software industry will benefit.

The OSS license used for this repository of analyzers does permit users to create a new analyzer which specifically handles copyright headers in a custom format and use that analyzer within their project. As long as the diagnostic IDs for file header diagnostics are changed, the custom analyzer and StyleCop.Analyzers could be used together in the same repository.

Travis-Snoozy · 2015-10-20T16:33:15Z

Yes, Copyright is valid, but (C) is meaningless; perpetuating its use is just silly, so fix it or drop it. Omission of a year is the primary flaw here, regardless. The implementation being tied to the placement of Copyright was intentional for multiple reasons; I'll elaborate further once I have a real keyboard.

http://www.copyright.gov/circs/circ01.pdf

Page 4, "Form of Notice for Visually Perceptible Copies" specifies the year as one of three critical points.

I'm perfectly happy to not handle multi-years, but ONE year is at minimum required.

Travis

sharwell · 2015-10-20T16:58:36Z

Yes, Copyright is valid, but (C) is meaningless; perpetuating its use is just silly, so fix it or drop it.

The current default text is the closest we can get to © while only using characters which have the same binary representation in Windows-1252 and UTF-8. If any change is made to the default, I expect it to be the result of ongoing discussions in the .NET Foundation. Note that you can currently customize the copyrightText value in stylecop.json, and if you use © there it will enforce the use of that mark in source files as well.

I'm perfectly happy to not handle multi-years, but ONE year is at minimum required.

We use this at my office by creating a variable which holds a fixed year (currently 2015). It also uses a © symbol instead of (c).

{
  "$schema": "https://raw.githubusercontent.com/DotNetAnalyzers/StyleCopAnalyzers/master/StyleCop.Analyzers/StyleCop.Analyzers/Settings/stylecop.schema.json",
  "settings": {
    "documentationRules": {
      "companyName": "Our Company",
      "copyrightText": "Copyright © {year} {companyName}. All Rights Reserved.",
      "xmlHeader": false,
      "variables": {
        "year": "2015"
      }
    }
  }
}

I think you can make a great case for expanding on this. A special variable (name TBD) which could be used in the copyrightText value with the following semantics:

When suggesting a fix, automatically expands to the current year.
When checking copyright headers, matches any four-digit year.

Travis-Snoozy · 2015-10-20T18:18:44Z

Still not at a kb, so bear with me --

The placeholder needs to match a valid Copyright symbol as the prev. token. Thus

{copyright} => ((Copyright)|©) [12]\d{3}

or similar. Trying to date-match in isolation is an ultimately useless activity, as we can't tell a date from any other sort of 4-digit # -- it's the direct proximity to the magic Copyright word that gives this sequence of digits importance, and the Copyright word in turn needs those digits. I considered trying to do full regex support as the ultimate generic way to solve this, but that plan fell over when suggested fixes had to be considered.

Re. encoding, that shouldn't be a concern: either tag the necessary files with a UTF-8 "BOM", or set up the HTTP server to serve the appropriate file encoding in the headers, and recode as appropriate to the output file's encoding. That said, AFAIK, all .NET languages need to support Unicode source files anyway*. If it's a Super Big Deal, then just drop the (C) altogether. Being squeamish about Unicode vs. legacy text seems oddly quaint, though. Very 90s. :)

Cheers,

Travis

I've noticed the analyzers seemed to choke when I used © in the name of a C# variable, so I presume there are bugs hiding in the code. I'll probably poke at that a bit later.

vweijsters · 2015-10-20T19:35:22Z

I think that the original test cases should not have been modified. Those were valid test cases and should not be modified for an optional feature.

vweijsters · 2015-10-20T19:38:53Z

❓ What is the point of supporting comma-separated list of years or year ranges?

Travis-Snoozy · 2015-10-20T21:09:25Z

50% sure I only touched tests that failed after my change (a shocking number did), but it's entirely possible some got dragged along for the ride. The refactoring was a rather boring autopilot haze.

Regardless, none of the tests' underlying flavor should have changed. If you have specific cases you'd like to try reverting, feel free to point them out.

Travis

sharwell · 2015-10-20T22:29:02Z

50% sure I only touched tests that failed after my change (a shocking number did)...

This was caused by changing the default value of the copyrightText configuration property. Even if we update the code to support a special replacement variable with a "flexible" definition for the year, the default value for copyrightText should not be changed to use it. Once the default value is restored, the previously created tests should work again without changes.

GregReddick · 2015-10-20T22:32:28Z

In addition, circular 3 gives more information about the proper form of copyright notices. http://www.copyright.gov/circs/circ03.pdf The (c) has no meaning. There was a court case where the C was enclosed in an octagon instead of a circle and it was ruled invalid.The year should be present. In the United States since 1 March 1989, not having a copyright notice does not mean that the work is uncopyrighted, however it does cause problems. (Not registering the copyright with the Library of Congress limits the damages that can be collected.)

Travis-Snoozy · 2015-10-21T00:26:09Z

Now that I'm at a keyboard... wall of text engage!

I would prefer the text Copyright © appear as literal text in the copyrightText property of stylecop.json, and the new placeholder only apply to the year ranges.

[...]

I think you can make a great case for expanding on this. A special variable (name TBD) which could be used in the copyrightText value with the following semantics:

When suggesting a fix, automatically expands to the current year.

When checking copyright headers, matches any four-digit year.

From what I can tell, the year should come immediately after either "Copyright", "Copr.", or "©". Thus, a generic year-only match is not a semantically equivalent substitute (although it would technically work, essentially you could put {year} tags anywhere, and it would not guarantee the enforcement of a full, legally recognized copyright statement).

A full regex/pattern system would be the logical extension as a fully-flexible and data-driven solution, but that would require a lot of plumbing, and lead to interesting requirements in terms of interacting with the suggested fixes code. It's a lot of work just to support a single templating scenario.

What I could do is put in another configuration knob that lets the customer chose an appropriate value (e.g., copyrightPhrase set by an enum like Full, Abbreviated, Symbol), and have that do the special expansion that checks for a word.

Sadly, this can't be an implicit rule that requires a four-digit year-like thing after any instance of "Copyright", "Copr.", or "©", because those words/symbols could potentially show up in the copyright header and not actually be the intended statement of copyright. For example: "This program is protected by U.S. Copyright law" would trigger improperly. The user has to be able to convey the explicit semantic intent of "this is a copyright notice," and that means, as far as I can tell, the whole notice needs to be handled as a single, explicitly-stated token.

The year of first publication is intentionally omitted from the default notice[...]

[...]

My hope is that as more projects adopt notices which intentionally omit the copyright year(s), the entire software industry will benefit.

[...]

The US Copyright Office does not suggest the use of or prescribe a form for copyright notices listing multiple years.

[...]

What is the point of supporting comma-separated list of years or year ranges?

Guidance given by the copyright office specifies a single year after the "Copyright", "Copr.", or "©". This represents, at least in the US, the "best practice," and that is exactly what StyleCop is supposed to be helping us conform to. By the same reasoning, I intend to remove the multi-year support from this change.

The existing copyright support uses the (legally-unrecognized) (C)[...]

[...]

It is the word 'Copyright' as opposed to the symbol '(c)' which meets this condition.

[...]

The (c) has no meaning.

Best practice would seem to be to not have (C) -- everyone seems to acknowledge it's not a legally recognized sequence. It's cruft. Kill the cruft. :)

Even if we update the code to support a special replacement variable with a "flexible" definition for the year, the default value for copyrightText should not be changed to use it.

[...]

If any change is made to the default, I expect it to be the result of ongoing discussions in the .NET Foundation.

[...]

In the United States since 1 March 1989, not having a copyright notice does not mean that the work is uncopyrighted, however it does cause problems. (Not registering the copyright with the Library of Congress limits the damages that can be collected.)

Omitting copyright years may seem like a trivial choice, but as Greg points out, there are real legal ramifications of making this choice. Upon learning this, I would consider it to be not only incorrect, but professionally unethical to continue with a default that does not enforce a year-labeled copyright notice.

We use this at my office by creating a variable which holds a fixed year (currently 2015).

This was the first option I reached for when I tried to get my codebase going... sadly, it leads to StyleCop errors in the source code unless all the files use the current year. Since it is improper and incorrect to change all the files to the latest year (especially if the code in the files hasn't changed), this sadly this is a non-solution to the bigger problem. Again, we need an unambiguous semantic declaration of "this is a copyright notice here," and the engine needs to be smart enough to treat it properly.

sharwell · 2015-10-21T04:32:02Z

I'm going to go a bit out of order on the responses.

This represents, at least in the US, the "best practice," and that is exactly what StyleCop is supposed to be helping us conform to.

The suggested guidance is the recommended practice to follow in order to maximize the ability to obtain statutory damages following an infringement. In my experience (and IANAL), source code files tend to derive only marginal benefits from this practice because they fall into one of two categories:

Open source software, where the work tends to not be for (direct) profit, and the original author is less likely to seek statutory damages as the result of an infringement.
Closed source software, where the source code files are not publicly distributed.

Rather than focus on a hypothetical outcome of obtaining statutory damages, the default behavior of the copyright headers in this project focuses on behavior which can be easily adopted and is appropriate at minimum for a majority of open source projects. The default header is essentially as easy to maintain as not having a header at all (maximum developer efficiency), but does provide a notice regarding the copyright holder for the work. Individual projects may expand on this by customizing the copyright text; as you can see in this project we added brief information regarding the license the source code is provided under and the location where the complete license text may be obtained.

From what I can tell, the year should come immediately after either "Copyright", "Copr.", or "©". Thus, a generic year-only match is not a semantically equivalent substitute (although it would technically work, essentially you could put {year} tags anywhere, and it would not guarantee the enforcement of a full, legally recognized copyright statement).

Copyright laws vary by country. By using simple replacement variables and customizable copyright text, we maximize the ability of the code to meet the needs of a varied audience. If a generic year could be written as {creationYear} (for example), then the copyright notice recommended by the US Copyright Office could be configured by the following line in stylecop.json. Further restrictions on the use of creationYear in the text (e.g. by requiring it appear after 'Copyright') only serve to limit its overall capability.

"copyrightText": "© {creationYear} {companyName}. All rights reserved."

Omitting copyright years may seem like a trivial choice, but as Greg points out, there are real legal ramifications of making this choice.

IMO, the long-term benefits to the developer community as a whole outweigh the marginal legal benefits provided by the more complete notice. This sentiment appears to be backed by de-facto use of notices without years in projects recently published by major organizations (including Microsoft), and is now being considered as a more official recommendation coming from the .NET Foundation itself.

If you can get the .NET Foundation to change its recommendation such that it includes a date in the copyright notice, then it would be much more compelling evidence that the StyleCop Analyzers default is not suitable for most .NET projects.

Travis-Snoozy · 2015-10-21T16:35:01Z

Two-and-half things first...

First: I'm disappearing until next week (this was supposed to be a small one-night change to take care of a snag in getting a project spun up). Don't be alarmed.

Half: The way the current patch works relies on "Copyright ©" being rare/unique enough in actual header text that it can be back-converted into a token. If that is straight-out unacceptable, reject this pull request.

Second: I can do a year-only solution. However, It will be quite difficult, and pretty much be 100% based on regexes. It will be much slower than the line-by-line ordinal comparison presently used. Before I embark, I want some level of guarantee that if I code that up, get it to work, document it, and write passing tests for it, that I'm not going to get a bunch of philosophical, ideological, or theoretical-performance pushback. :)

Onward to ideological debate...

IMO, the long-term benefits to the developer community as a whole outweigh the marginal legal benefits provided by the more complete notice.

Name one benefit of not having a year.

None of the benefits listed on the link provided are valid for years. Point by point:

Keeping the headers "as brief as possible [...] to make it easy to apply the right header" is bunk if you're using StyleCop (save for the fact that 2015 StyleCop doesn't handle dates right... but that's a catch-22 ;)).
"Minimize [...] polluting git history" is bunk: you should set the copyright header only once, and not be updating the year; this is more-relevant to not having each file copyrighted to a different person.
"[...] file header being incorrect by getting out of date over time as code is refactored" is partially valid, but weak; it can be worked around by following best-practices for one class per file, file named as the class. Again, more relevant to per-person attribution.
"Make sure we give proper credit [...]" and...
"clearly identify [...] who originally submitted a contribution" again has to do with dropping per-person attribution and consolidating that in a single CONTRIBUTORS file.
"[...]properly attributing other open source included in our projects[...]" has more to do with the LICENSE file, and (if code is included as source), the copyright notice should absolutely not be tampered with (there are legal implications surrounding that).

The only explicit reference to removing the copyright year states it's done solely "to avoid unnecessary churn in the code base." That argument is completely invalid if single-year notices are used. When the file is created, the copyright notice should be in it: there should be exactly 0 churn.

This sentiment appears to be backed by de-facto use of notices without years in projects recently published by major organizations (including Microsoft), and is now being considered as a more official recommendation coming from the .NET Foundation itself.

Now, as in right now, as in two days ago, the same day I submitted this patch. It's not a standard, it's only just been proposed, and it has made exactly zero good points for actually removing the year. Trying to dodge to "de facto" over an actual codified legal standard also doesn't seem like a particularly convincing point. Refusing to look at the consequences of the choice because they are hypothetical also doesn't strengthen this position: if the hypothetical consequences are unimportant, we shouldn't even put a copyright notice in. Copyright is automatic in the US, and even then, copyrights are only important in the hypothetical situation that somebody tries to violate our license. :)

Microsoft, as a large corporation, has its own agenda. It isn't going to be interested in chasing down infringement of its sample code or other such items it releases to the wild. Microsoft is also a predominately closed-source operation, and it is less-likely (though still possible, with so many people internally having access to so much code) that single file(s) will be stolen and then used. If infringement happens, and they care, they're only going to care because huge profits were reaped, or Microsoft's huge profits were eaten into -- statutory damages are absolute chump change to begin with for Microsoft, and they would never opt to take them. This policy makes perfect sense for a company like Microsoft.

For open-source work, where actual damages to the copyright holder may be zero (and where the profits of the infringer may be small by comparison to the whole product the infringement took place in), statutory damages may be the only actual value a "little guy" can get. I refer you to BusyBox, who out of half a dozen settlements, actually won one the single case that went to court... with statutory damages only. This code was embedded into hundreds, if not thousands of real devices that went out, but BusyBox still went for statutory rather than actual damages+profits... hardware is, after all, a low-margin industry.

sharwell · 2015-10-21T17:53:13Z

Based on information in Circular 3, we see that the impact of including inexact dates is the following:

If the date is earlier than the actual publication date, then the notice is valid but the copyright protection period is potentially shorter than it otherwise would be.
If the date is later than the actual publication date, then the notice is equivalent to omitted.

During certain operations such as refactoring which can create new files or move content from one existing file to another, it may be difficult to determine if the content of the new file is original content which was not previously created, or simply relocation of content which was already created (and potentially published). Failure to be exactly correct when choosing a date for the header could result in the invalidation of the notice. The safest way to ensure indisputable protection for all content in the project is to use a copyright notice on all source files which is dated in the year that work on the project was started. For example, this repository would contain a notice dated 2014 on all files, regardless of the time when they were added. The protection would then extend to the year 2109.

Since this strategy uses fixed dates, it is already supported by StyleCop Analyzers. Considering that C# 1.0 was released in the year 2000, we can assume that projects using StyleCop Analyzers have an earliest date of publication for C# code no earlier than the year 2000. As of today, the discussions in this issue therefore only impact the ability to seek statutory damages for copyright infringements which occur between 2095 (at the earliest) and 2110. I feel like this buys us a little bit of time to really think about a solution which protects developers while minimizing the amount of work required to maintain accurate copyright notices.

Before I embark, I want some level of guarantee that if I code that up, get it to work, document it, and write passing tests for it, that I'm not going to get a bunch of philosophical, ideological, or theoretical-performance pushback.

The best you can get on this front is filing a proposal describing the specific behavior you want to support and the manner in which you would expose it (e.g. changes to semantics of stylecop.json, etc.), and then waiting for it to be approved before starting the implementation. This covers the "...philosophical, ideological..." concerns. In addition, there are several active participants in the project who expressed a great deal of interest in finding ways to incorporate approved functionality in a manner that does not negatively impact performance, including but definitely not limited to myself, @pdelvo, and @vweijsters. That said, if things don't work out here then remember two things:

It's not just you. I've had three of my own proposals (Place text in paragraphs #601, Use block-level elements consistently #602, Use block-level elements consistently across elements of the same kind #603) get rejected after I took the time to prepare, implement, and test them - and they weren't simple rules to implement (Implement new block-level documentation diagnostics #610).
The license used by code in this repository allows you to create another analyzer which behaves differently. Roslyn allows you to install and use many different analyzer packages in the same project, and you can use a single rule set file to control everything. Several people who use StyleCop.Analyzers have other analyzers installed as part of a complete solution tailored to their product. If your changes don't get merged here, it definitely doesn't mean they were a waste of time.

As a final note, please consider writing up some of your concerns in the .NET Foundation thread on this topic. So far the conversation has been quite one-sided.

Travis-Snoozy · 2015-10-28T00:15:58Z

Thanks for putting in a reference back to this thread on the .NET foundation discussion -- I don't have an account over there, and really I'd rather be coding than adding more accounts to my keyring. :)

Would it be preferred to file a fresh issue with the "proposal" tag, or pile on to #1357?

sharwell · 2015-10-28T04:54:19Z

Either way works. 😄

sharwell added enhancement code review needs discussion and removed code review labels Oct 20, 2015

Travis-Snoozy closed this Oct 28, 2015

sharwell mentioned this pull request Apr 11, 2016

Question: Adding the year interval in the copyright #2120

Closed

sharwell mentioned this pull request Jul 20, 2016

Add an iterative version of the ParseTreeWalker antlr/antlr4#1231

Merged

jbduncan mentioned this pull request Dec 25, 2016

Is the license header for Spotless itself correct? diffplug/spotless#57

Closed

sharwell mentioned this pull request Jan 6, 2020

Allow file/copyright header template to have wildcard characters #3101

Closed

Use legally-correct copyright notices. #1661

Use legally-correct copyright notices. #1661

Uh oh!

Conversation

Travis-Snoozy commented Oct 20, 2015

Uh oh!

codecov-io commented Oct 20, 2015

Current coverage is 79.14%

Uh oh!

sharwell commented Oct 20, 2015

Uh oh!

sharwell commented Oct 20, 2015

Uh oh!

sharwell commented Oct 20, 2015

Uh oh!

Travis-Snoozy commented Oct 20, 2015

Uh oh!

sharwell commented Oct 20, 2015

Uh oh!

Travis-Snoozy commented Oct 20, 2015

Uh oh!

vweijsters commented Oct 20, 2015

Uh oh!

vweijsters commented Oct 20, 2015

Uh oh!

Travis-Snoozy commented Oct 20, 2015

Uh oh!

sharwell commented Oct 20, 2015

Uh oh!

GregReddick commented Oct 20, 2015

Uh oh!

Travis-Snoozy commented Oct 21, 2015

Uh oh!

sharwell commented Oct 21, 2015

Uh oh!

Travis-Snoozy commented Oct 21, 2015

Uh oh!

sharwell commented Oct 21, 2015

Uh oh!

Travis-Snoozy commented Oct 28, 2015

Uh oh!

sharwell commented Oct 28, 2015

Uh oh!

Uh oh!

Current coverage is `79.14%`