Skip to content

Conversation

luke-jr
Copy link
Member

@luke-jr luke-jr commented Jul 27, 2023

There's been at least a few instances where someone tried to contribute LLM-generated content, but such content has a dubious copyright status.

Our contributing policy already implicitly rules out such contributions, but being more explicit here might help.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 27, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK kevkevinpal
Concept NACK ryanofsky, glozow
Concept ACK jonatack, russeree, Sjors, petertodd

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@jonatack
Copy link
Member

Concept ACK, makes sense, though IANAL.

@portlandhodl
Copy link
Contributor

portlandhodl commented Jul 28, 2023

Concept ACK.

The two thoughts that come to mind are that

but not limited to, ChatGPT, GitHub Copilot, and Meta LLaMA

  1. This section could become a cat and mouse game between the various models and which ones of these will still have relevance in the future.

  2. LLMs right now are the benchmark for text to text models but there are other types of models as well example NLP and RNN models. So this language could become obsolete overtime.

This post was written by GPT4 ... Just Kidding.

@kevkevinpal
Copy link
Contributor

kevkevinpal commented Jul 28, 2023

Conecpt ACK 08f9f62

I was one of the mentioned PR's #28101 (comment)

Would make sense to have this in the CONTRIBUTING.md

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, and after looking on some LLM-term of service and in basic knowledge of copyright law, there is an uncertainty on the status of LLM output. It sounds LLM or AI operating platforms in their terms of service do no make the claim they own the intellectual property of the LLM output, and if even if they do so it’s probably an unfounded claim. A user might mix an “original” or “creative” element by sending an individual prompt request, a determining factor in any matter of intellectual property rights assignment.

To the best of my knowledge there has been no legal precedent on the matter in any major jurisdiction. However, there are ongoing proposals to rework legal framework in matter of data use and AI (at least in the EU), and this will be probably change the question.

My personal opinion would be to left the contributing rules unchanged for now and looked again in 24 / 36 months when there is more clarity on the matter, if any.

See also the recent page https://en.wikipedia.org/wiki/Wikipedia:Large_language_models_and_copyright

If you do not know where the work comes from and/or its license terms, it may
not be contributed until that is resolved. In particular, anything generated by
AI or LLMs derived from undisclosed or otherwise non-MIT-compatible inputs
(including, but not limited to, ChatGPT, GitHub Copilot, and Meta LLaMA) cannot
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend to drop any reference to a corporate entity, or one of its product in Bitcoin Core documentation, to avoid a mischaracterization of what they’re doing (whatever one personal opinion).

We already mentioned Github a lot in the contributing.md, though only as the technical platform where contributions are happening, not taking a stance on one of their product.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned these specifically because:

  1. ChatGPT is the most popularly known, and most likely to be searched for if someone is considering using it.
  2. GitHub promotes use of Copilot heavily, and we are using GitHub.
  3. Meta is falsely advertising LLaMA as open source, and many people are just believing that without verifying. (The source code is not available, and the license is not permissive)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to mention these examples.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a) there is no certainty ChatGPT / LLaMa will be the most popular framework 12 / 18 months from now and I don’t think we’re going to update contributing rules everytime and b) Meta is a registered trademark of a commercial entity and I think it’s better to not give the appearance Bitcoin Core the project is supportive or supported or linked to Meta in anyway.

@ariard
Copy link

ariard commented Jul 28, 2023

Sent a mail to Jess from the Bitcoin Defense Legal Fund to collect more legal opinions with the context. Normally luke-jr (luke@dashjr.org) and jonatack (jon@atack.com) are cc.

@Sjors
Copy link
Member

Sjors commented Jul 29, 2023

Concept ACK, but happy to wait for legal opinions. Hopefully they clarify the risks in two separate categories:

  1. Content the AI obtained somewhere else without permission (i.e. claims from the original author)

  2. Content the AI generated itself. Potentially owned by some corporation who didn't give permission to the person making the pull request to MIT license it.

When it comes to (1) I'm more worried about snippets of fresh code than e.g. suggested refactorings. I don't see how one can claim copyright over e.g. the use of std::map over std::vector.

When it comes to (2) in a way it's not a new risk. A contributor could already have a consultant standing next to them telling them what to write. It could then turn out the code belongs to that consultant (who didn't license it MIT) and not contributor (who at least implicitly did). But these AI companies have a lot more budget to actually legally harass the project over such claims.

@petertodd
Copy link
Contributor

ACK

The copyright lobby is pretty strong, and stands to lose a lot from AI. I think there's a significant chance that AI copyright gets resolved in favor of copyright owners in such a way that is disastrous for AI. Just look at how the copyright lobby managed to keep extending the duration of copyrights worldwide to ludicrious, economically irrational, lengths until very recently.

Also, AI poses unknown security threats. It frequently hallucinates incorrect answers. Bitcoin Core is a type of software that needs to meet very stringent reliability standards, so much so that review is usually more work than righting the code. Saving time righting the code doesn't help much, and poses as yet unknown risks.

@ryanofsky
Copy link
Contributor

ryanofsky commented Jul 31, 2023

NACK from me, because I think legal questions like this are essentially political questions, and you make absurd legal outcomes more likely to happen by expecting them to happen, and by writing official documentation which gives them credence.

If the risk is that openai or microsoft could claim copyright over parts of the bitcoin codebase, that would be absurd because their usage agreements assign away their rights to the output and say it can be used for any purpose.

If the risk is that someone else could claim copyright over parts of the bitcoin codebase, like in the SCO case (https://en.wikipedia.org/wiki/SCO%E2%80%93Linux_disputes), that would also be an absurd outcome, which would have bigger repercussions beyond this project, and could happen about as easily without an LLM involved.

@ariard
Copy link

ariard commented Jul 31, 2023

I had the feedback from the Bitcoin Defense Legal Fund, they need more time to analyze the issue though it is thought as an important one.

I did additionally cc Andrew Chow and Fanquake as maintainers on the mail thread for open-source transparency.

I think whatever one individual political opinion on copyrights or legal risks tolerance, the fact is we have already dubious copyrights litigations affecting the project, so I think it’s reasonable to wait for risk clarification before to do a change to the contributing rules on the usage of AI / LLM tooling.

@fanquake
Copy link
Member

fanquake commented Aug 1, 2023

I mostly agree with @ryanofsky.

The reality is that going forward it'll be essentially impossible to avoid contributions that may include output from AI/LLMs, just because (in almost all cases) it'll be impossible to tell, unless the author makes it apparent.

We certainly don't want to end up in some situation where contributors are trying to "guess" or point out these types of contributions, or end up with reversion PRs (incorrectly) trying to remove certain content.

If we end up with an opinion from the BLDF then maybe we can consider making an addition to our license, if necessary.

@ryanofsky
Copy link
Contributor

 If we end up with an opinion from the BLDF then maybe we can consider making an addition to our license, if necessary.

+1. If we have professional advise to change the license or add a separate policy document or agreement like a CLA, we should consider doing that. But we shouldn't freelance and add legal speculation to developer documentation.

In this case and in general I think a good strategy is to:

  • First, focus on doing the right thing morally. If contribution includes content that seems plagiarized, or not credited properly, or is not fair to someone, we should not include it.
  • Second, try not to make political mistakes. Avoid doing things that would be unpopular broadly or would offend a particular group of people and provoke an attack. Avoid waving meat in front of hyenas and taking actions that could give credibility to absurd legal claims.
  • Third, try not to innovate. Have a software license, follow professional advise, maybe participate in a patent network. Avoid doing things that are speculative or new and not obvious wins.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reality is that going forward it'll be essentially impossible to avoid contributions that may include output from AI/LLMs, just because (in almost all cases) it'll be impossible to tell, unless the author makes it apparent.

Maybe making it apparent is part of the problem. There is no requirement to state publicly which technical tools were involved in a contribution, so for now it might be best if everyone would just use their favourite LLM helpers silently (as, I am sure, many contributors already do!).

@fanquake fanquake marked this pull request as draft August 3, 2023 09:55
@fanquake
Copy link
Member

fanquake commented Aug 3, 2023

Moved to draft for now, as there's not consensus to merge as-is, and in any case, this is waiting on further legal opinions.

@glozow
Copy link
Member

glozow commented Dec 21, 2023

Agree with the intent of avoiding legal problems, but NACK on adding this text. Unless we have some kind of legal guidance saying this text would protect us beyond what our existing license-related docs say, I don't see any reason to discourage specific tools in the contributing guidelines. I agree with the above that trying to speculate/innovate can do more harm than good.

I think we should close this for now and reconsider if/when a a lawyer advises us to do something like this.

@DrahtBot DrahtBot requested a review from jonatack December 21, 2023 11:34
@fanquake
Copy link
Member

fanquake commented Jan 5, 2024

Closing this for now.

@fanquake fanquake closed this Jan 5, 2024
@bitcoin bitcoin locked and limited conversation to collaborators Jan 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.