-
Notifications
You must be signed in to change notification settings - Fork 37.7k
CONTRIBUTING: Caution against using AI/LLMs (ChatGPT, Copilot, etc) #28175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
Concept ACK, makes sense, though IANAL. |
Concept ACK. The two thoughts that come to mind are that
This post was written by GPT4 ... Just Kidding. |
Conecpt ACK 08f9f62 I was one of the mentioned PR's #28101 (comment) Would make sense to have this in the CONTRIBUTING.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, and after looking on some LLM-term of service and in basic knowledge of copyright law, there is an uncertainty on the status of LLM output. It sounds LLM or AI operating platforms in their terms of service do no make the claim they own the intellectual property of the LLM output, and if even if they do so it’s probably an unfounded claim. A user might mix an “original” or “creative” element by sending an individual prompt request, a determining factor in any matter of intellectual property rights assignment.
To the best of my knowledge there has been no legal precedent on the matter in any major jurisdiction. However, there are ongoing proposals to rework legal framework in matter of data use and AI (at least in the EU), and this will be probably change the question.
My personal opinion would be to left the contributing rules unchanged for now and looked again in 24 / 36 months when there is more clarity on the matter, if any.
See also the recent page https://en.wikipedia.org/wiki/Wikipedia:Large_language_models_and_copyright
If you do not know where the work comes from and/or its license terms, it may | ||
not be contributed until that is resolved. In particular, anything generated by | ||
AI or LLMs derived from undisclosed or otherwise non-MIT-compatible inputs | ||
(including, but not limited to, ChatGPT, GitHub Copilot, and Meta LLaMA) cannot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend to drop any reference to a corporate entity, or one of its product in Bitcoin Core documentation, to avoid a mischaracterization of what they’re doing (whatever one personal opinion).
We already mentioned Github a lot in the contributing.md, though only as the technical platform where contributions are happening, not taking a stance on one of their product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned these specifically because:
- ChatGPT is the most popularly known, and most likely to be searched for if someone is considering using it.
- GitHub promotes use of Copilot heavily, and we are using GitHub.
- Meta is falsely advertising LLaMA as open source, and many people are just believing that without verifying. (The source code is not available, and the license is not permissive)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to mention these examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a) there is no certainty ChatGPT / LLaMa will be the most popular framework 12 / 18 months from now and I don’t think we’re going to update contributing rules everytime and b) Meta is a registered trademark of a commercial entity and I think it’s better to not give the appearance Bitcoin Core the project is supportive or supported or linked to Meta in anyway.
Sent a mail to Jess from the Bitcoin Defense Legal Fund to collect more legal opinions with the context. Normally luke-jr (luke@dashjr.org) and jonatack (jon@atack.com) are cc. |
Concept ACK, but happy to wait for legal opinions. Hopefully they clarify the risks in two separate categories:
When it comes to (1) I'm more worried about snippets of fresh code than e.g. suggested refactorings. I don't see how one can claim copyright over e.g. the use of When it comes to (2) in a way it's not a new risk. A contributor could already have a consultant standing next to them telling them what to write. It could then turn out the code belongs to that consultant (who didn't license it MIT) and not contributor (who at least implicitly did). But these AI companies have a lot more budget to actually legally harass the project over such claims. |
ACK The copyright lobby is pretty strong, and stands to lose a lot from AI. I think there's a significant chance that AI copyright gets resolved in favor of copyright owners in such a way that is disastrous for AI. Just look at how the copyright lobby managed to keep extending the duration of copyrights worldwide to ludicrious, economically irrational, lengths until very recently. Also, AI poses unknown security threats. It frequently hallucinates incorrect answers. Bitcoin Core is a type of software that needs to meet very stringent reliability standards, so much so that review is usually more work than righting the code. Saving time righting the code doesn't help much, and poses as yet unknown risks. |
NACK from me, because I think legal questions like this are essentially political questions, and you make absurd legal outcomes more likely to happen by expecting them to happen, and by writing official documentation which gives them credence. If the risk is that openai or microsoft could claim copyright over parts of the bitcoin codebase, that would be absurd because their usage agreements assign away their rights to the output and say it can be used for any purpose. If the risk is that someone else could claim copyright over parts of the bitcoin codebase, like in the SCO case (https://en.wikipedia.org/wiki/SCO%E2%80%93Linux_disputes), that would also be an absurd outcome, which would have bigger repercussions beyond this project, and could happen about as easily without an LLM involved. |
I had the feedback from the Bitcoin Defense Legal Fund, they need more time to analyze the issue though it is thought as an important one. I did additionally cc Andrew Chow and Fanquake as maintainers on the mail thread for open-source transparency. I think whatever one individual political opinion on copyrights or legal risks tolerance, the fact is we have already dubious copyrights litigations affecting the project, so I think it’s reasonable to wait for risk clarification before to do a change to the contributing rules on the usage of AI / LLM tooling. |
I mostly agree with @ryanofsky. The reality is that going forward it'll be essentially impossible to avoid contributions that may include output from AI/LLMs, just because (in almost all cases) it'll be impossible to tell, unless the author makes it apparent. We certainly don't want to end up in some situation where contributors are trying to "guess" or point out these types of contributions, or end up with reversion PRs (incorrectly) trying to remove certain content. If we end up with an opinion from the BLDF then maybe we can consider making an addition to our license, if necessary. |
+1. If we have professional advise to change the license or add a separate policy document or agreement like a CLA, we should consider doing that. But we shouldn't freelance and add legal speculation to developer documentation. In this case and in general I think a good strategy is to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reality is that going forward it'll be essentially impossible to avoid contributions that may include output from AI/LLMs, just because (in almost all cases) it'll be impossible to tell, unless the author makes it apparent.
Maybe making it apparent is part of the problem. There is no requirement to state publicly which technical tools were involved in a contribution, so for now it might be best if everyone would just use their favourite LLM helpers silently (as, I am sure, many contributors already do!).
Moved to draft for now, as there's not consensus to merge as-is, and in any case, this is waiting on further legal opinions. |
Agree with the intent of avoiding legal problems, but NACK on adding this text. Unless we have some kind of legal guidance saying this text would protect us beyond what our existing license-related docs say, I don't see any reason to discourage specific tools in the contributing guidelines. I agree with the above that trying to speculate/innovate can do more harm than good. I think we should close this for now and reconsider if/when a a lawyer advises us to do something like this. |
Closing this for now. |
There's been at least a few instances where someone tried to contribute LLM-generated content, but such content has a dubious copyright status.
Our contributing policy already implicitly rules out such contributions, but being more explicit here might help.