CONTRIBUTING: Caution against using AI/LLMs (ChatGPT, Copilot, etc) #28175

luke-jr · 2023-07-27T23:05:30Z

There's been at least a few instances where someone tried to contribute LLM-generated content, but such content has a dubious copyright status.

Our contributing policy already implicitly rules out such contributions, but being more explicit here might help.

DrahtBot · 2023-07-27T23:05:32Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	kevkevinpal
Concept NACK	ryanofsky, glozow
Concept ACK	jonatack, russeree, Sjors, petertodd

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

jonatack · 2023-07-27T23:08:20Z

Concept ACK, makes sense, though IANAL.

portlandhodl · 2023-07-28T02:01:45Z

Concept ACK.

The two thoughts that come to mind are that

but not limited to, ChatGPT, GitHub Copilot, and Meta LLaMA

This section could become a cat and mouse game between the various models and which ones of these will still have relevance in the future.
LLMs right now are the benchmark for text to text models but there are other types of models as well example NLP and RNN models. So this language could become obsolete overtime.

This post was written by GPT4 ... Just Kidding.

kevkevinpal · 2023-07-28T03:42:01Z

Conecpt ACK 08f9f62

I was one of the mentioned PR's #28101 (comment)

Would make sense to have this in the CONTRIBUTING.md

ariard

To be honest, and after looking on some LLM-term of service and in basic knowledge of copyright law, there is an uncertainty on the status of LLM output. It sounds LLM or AI operating platforms in their terms of service do no make the claim they own the intellectual property of the LLM output, and if even if they do so it’s probably an unfounded claim. A user might mix an “original” or “creative” element by sending an individual prompt request, a determining factor in any matter of intellectual property rights assignment.

To the best of my knowledge there has been no legal precedent on the matter in any major jurisdiction. However, there are ongoing proposals to rework legal framework in matter of data use and AI (at least in the EU), and this will be probably change the question.

My personal opinion would be to left the contributing rules unchanged for now and looked again in 24 / 36 months when there is more clarity on the matter, if any.

See also the recent page https://en.wikipedia.org/wiki/Wikipedia:Large_language_models_and_copyright

ariard · 2023-07-28T03:11:51Z

CONTRIBUTING.md

+If you do not know where the work comes from and/or its license terms, it may
+not be contributed until that is resolved. In particular, anything generated by
+AI or LLMs derived from undisclosed or otherwise non-MIT-compatible inputs
+(including, but not limited to, ChatGPT, GitHub Copilot, and Meta LLaMA) cannot


I would recommend to drop any reference to a corporate entity, or one of its product in Bitcoin Core documentation, to avoid a mischaracterization of what they’re doing (whatever one personal opinion).

We already mentioned Github a lot in the contributing.md, though only as the technical platform where contributions are happening, not taking a stance on one of their product.

I mentioned these specifically because:

ChatGPT is the most popularly known, and most likely to be searched for if someone is considering using it.

GitHub promotes use of Copilot heavily, and we are using GitHub.

Meta is falsely advertising LLaMA as open source, and many people are just believing that without verifying. (The source code is not available, and the license is not permissive)

I think it's fine to mention these examples.

I think a) there is no certainty ChatGPT / LLaMa will be the most popular framework 12 / 18 months from now and I don’t think we’re going to update contributing rules everytime and b) Meta is a registered trademark of a commercial entity and I think it’s better to not give the appearance Bitcoin Core the project is supportive or supported or linked to Meta in anyway.

ariard · 2023-07-28T19:52:40Z

Sent a mail to Jess from the Bitcoin Defense Legal Fund to collect more legal opinions with the context. Normally luke-jr (luke@dashjr.org) and jonatack (jon@atack.com) are cc.

Sjors · 2023-07-29T11:13:25Z

Concept ACK, but happy to wait for legal opinions. Hopefully they clarify the risks in two separate categories:

Content the AI obtained somewhere else without permission (i.e. claims from the original author)
Content the AI generated itself. Potentially owned by some corporation who didn't give permission to the person making the pull request to MIT license it.

When it comes to (1) I'm more worried about snippets of fresh code than e.g. suggested refactorings. I don't see how one can claim copyright over e.g. the use of std::map over std::vector.

When it comes to (2) in a way it's not a new risk. A contributor could already have a consultant standing next to them telling them what to write. It could then turn out the code belongs to that consultant (who didn't license it MIT) and not contributor (who at least implicitly did). But these AI companies have a lot more budget to actually legally harass the project over such claims.

petertodd · 2023-07-29T11:51:52Z

ACK

The copyright lobby is pretty strong, and stands to lose a lot from AI. I think there's a significant chance that AI copyright gets resolved in favor of copyright owners in such a way that is disastrous for AI. Just look at how the copyright lobby managed to keep extending the duration of copyrights worldwide to ludicrious, economically irrational, lengths until very recently.

Also, AI poses unknown security threats. It frequently hallucinates incorrect answers. Bitcoin Core is a type of software that needs to meet very stringent reliability standards, so much so that review is usually more work than righting the code. Saving time righting the code doesn't help much, and poses as yet unknown risks.

ryanofsky · 2023-07-31T17:11:52Z

NACK from me, because I think legal questions like this are essentially political questions, and you make absurd legal outcomes more likely to happen by expecting them to happen, and by writing official documentation which gives them credence.

If the risk is that openai or microsoft could claim copyright over parts of the bitcoin codebase, that would be absurd because their usage agreements assign away their rights to the output and say it can be used for any purpose.

If the risk is that someone else could claim copyright over parts of the bitcoin codebase, like in the SCO case (https://en.wikipedia.org/wiki/SCO%E2%80%93Linux_disputes), that would also be an absurd outcome, which would have bigger repercussions beyond this project, and could happen about as easily without an LLM involved.

ariard · 2023-07-31T23:46:12Z

I had the feedback from the Bitcoin Defense Legal Fund, they need more time to analyze the issue though it is thought as an important one.

I did additionally cc Andrew Chow and Fanquake as maintainers on the mail thread for open-source transparency.

I think whatever one individual political opinion on copyrights or legal risks tolerance, the fact is we have already dubious copyrights litigations affecting the project, so I think it’s reasonable to wait for risk clarification before to do a change to the contributing rules on the usage of AI / LLM tooling.

fanquake · 2023-08-01T09:33:44Z

I mostly agree with @ryanofsky.

The reality is that going forward it'll be essentially impossible to avoid contributions that may include output from AI/LLMs, just because (in almost all cases) it'll be impossible to tell, unless the author makes it apparent.

We certainly don't want to end up in some situation where contributors are trying to "guess" or point out these types of contributions, or end up with reversion PRs (incorrectly) trying to remove certain content.

If we end up with an opinion from the BLDF then maybe we can consider making an addition to our license, if necessary.

ryanofsky · 2023-08-01T13:04:41Z

If we end up with an opinion from the BLDF then maybe we can consider making an addition to our license, if necessary.

+1. If we have professional advise to change the license or add a separate policy document or agreement like a CLA, we should consider doing that. But we shouldn't freelance and add legal speculation to developer documentation.

In this case and in general I think a good strategy is to:

First, focus on doing the right thing morally. If contribution includes content that seems plagiarized, or not credited properly, or is not fair to someone, we should not include it.
Second, try not to make political mistakes. Avoid doing things that would be unpopular broadly or would offend a particular group of people and provoke an attack. Avoid waving meat in front of hyenas and taking actions that could give credibility to absurd legal claims.
Third, try not to innovate. Have a software license, follow professional advise, maybe participate in a patent network. Avoid doing things that are speculative or new and not obvious wins.

mzumsande

The reality is that going forward it'll be essentially impossible to avoid contributions that may include output from AI/LLMs, just because (in almost all cases) it'll be impossible to tell, unless the author makes it apparent.

Maybe making it apparent is part of the problem. There is no requirement to state publicly which technical tools were involved in a contribution, so for now it might be best if everyone would just use their favourite LLM helpers silently (as, I am sure, many contributors already do!).

fanquake · 2023-08-03T09:56:42Z

Moved to draft for now, as there's not consensus to merge as-is, and in any case, this is waiting on further legal opinions.

glozow · 2023-12-21T11:34:30Z

Agree with the intent of avoiding legal problems, but NACK on adding this text. Unless we have some kind of legal guidance saying this text would protect us beyond what our existing license-related docs say, I don't see any reason to discourage specific tools in the contributing guidelines. I agree with the above that trying to speculate/innovate can do more harm than good.

I think we should close this for now and reconsider if/when a a lawyer advises us to do something like this.

fanquake · 2024-01-05T17:47:51Z

Closing this for now.

CONTRIBUTING: Caution against using LLMs

08f9f62

DrahtBot added the CI failed label Jul 28, 2023

ariard reviewed Jul 28, 2023

View reviewed changes

DrahtBot removed the CI failed label Jul 28, 2023

fanquake mentioned this pull request Jul 31, 2023

rpc: Add test-only RPC getaddrmaninfo for new/tried table address count #27511

Merged

mzumsande reviewed Aug 1, 2023

View reviewed changes

fanquake marked this pull request as draft August 3, 2023 09:55

ariard mentioned this pull request Aug 30, 2023

[WIP] BIP300 (Drivechains) consensus-level logic #28311

Closed

DrahtBot added CI failed and removed CI failed labels Oct 15, 2023

DrahtBot added the CI failed label Oct 25, 2023

DrahtBot requested a review from jonatack December 21, 2023 11:34

fanquake closed this Jan 5, 2024

bitcoin locked and limited conversation to collaborators Jan 4, 2025

CONTRIBUTING: Caution against using AI/LLMs (ChatGPT, Copilot, etc) #28175

CONTRIBUTING: Caution against using AI/LLMs (ChatGPT, Copilot, etc) #28175

Uh oh!

Conversation

luke-jr commented Jul 27, 2023

Uh oh!

DrahtBot commented Jul 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage

Reviews

Uh oh!

jonatack commented Jul 27, 2023

Uh oh!

portlandhodl commented Jul 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevkevinpal commented Jul 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ariard left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ariard Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

luke-jr Jul 28, 2023

Choose a reason for hiding this comment

Uh oh!

Sjors Jul 29, 2023

Choose a reason for hiding this comment

Uh oh!

ariard Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

ariard commented Jul 28, 2023

Uh oh!

Sjors commented Jul 29, 2023

Uh oh!

petertodd commented Jul 29, 2023

Uh oh!

ryanofsky commented Jul 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ariard commented Jul 31, 2023

Uh oh!

fanquake commented Aug 1, 2023

Uh oh!

ryanofsky commented Aug 1, 2023

Uh oh!

mzumsande left a comment

Choose a reason for hiding this comment

Uh oh!

fanquake commented Aug 3, 2023

Uh oh!

glozow commented Dec 21, 2023

Uh oh!

fanquake commented Jan 5, 2024

Uh oh!

Uh oh!

DrahtBot commented Jul 27, 2023 •

edited

Loading

portlandhodl commented Jul 28, 2023 •

edited

Loading

kevkevinpal commented Jul 28, 2023 •

edited

Loading

ariard left a comment •

edited

Loading

ryanofsky commented Jul 31, 2023 •

edited

Loading