-
-
Notifications
You must be signed in to change notification settings - Fork 16.6k
CONTRIBUTING: Clarify commit separation convention #400934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving this section of the docs. Commit conventions are by far the roughest part of the contribution process.
CONTRIBUTING.md
Outdated
@@ -491,19 +491,27 @@ To get a sense for what changes are considered mass rebuilds, see [previously me | |||
## Commit conventions | |||
[commit-conventions]: #commit-conventions | |||
|
|||
- Create a commit for each logical unit. | |||
- A commit should _only_ be split into more commits if it's not independent and understandable by itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be the opposite?
- A commit should _only_ be split into more commits if it's not independent and understandable by itself. | |
- A commit should _only_ be split into more commits if each split commit is independent and understandable by itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to push on this a little bit: is a "maintainers: add ..." commit like this really "understandable by itself"? It sort of feels like adding dead code to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SigmaSquadron No that's intentional. The idea is that one should be able to judge each commit separately, decide whether it's independent and understandable by itself, and only if it clearly isn't, request it to be split. Once split, each of the split commits need to satisfy this condition by themselves too of course.
With the given suggestion however, commits can't be judged separately, and whether to split seems to instead depend on the outcome of the split, which is somewhat self-referential and kind of implying that authors are expected to split commits if possible, which is what I'm trying to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the "understandable by itself" is the problem with the wording. It's intended to prevent commits that are way too large and do too many things, making it really hard to review. It's not intended for commits that are small but somewhat "disconnected" from other commits in the same PR. Suggestions welcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wording problem is rather that “you should only ask someone to split a commit if …” is very different from “a commit should only be split if …”. The latter sounds like it’s forbidding making particularly atomic and tidy commits, rather than only forbidding demanding people do that when it’s not an egregious case. (As I said on Matrix, while I’d never expect a contributor to produce a commit history like https://github.com/NixOS/nixpkgs/pull/339702/commits, I think it is better for review and bisection than if I had squashed it down into fewer commits, so I’d be sad to add a guideline that could be used to justify nitpicking it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes I think you're on point with that. I pushed an update to address that. It notably becomes just
- Commits should be independent and understandable by themselves.
- Later commits that undo changes from earlier commits should be combined.
The note from this PR is arguably the most important addition then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(edit: I wrote and published this before seeing @infinisil's comment above. It may not make sense in context)
I think part of the confusion is this fragment:
A commit should only be split into more commits if it's not independent
It's hard for me to imagine how a commit that is not independent would become independent by splitting it up. Commits that are not independent generally are fixed by combining them with other commits so they can stand alone.
To try to make a suggestion:
- Commits should "stand alone". We should be able to merge the first N commits of your PR without breaking nixpkgs.
- That doesn't mean you should do all your work in 1 big commit. Strive to tell a comprehensible story with your commits. It's easier to review a refactor commit independently of a bugfix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Commits should "stand alone". We should be able to merge the first N commits of your PR without breaking nixpkgs.
While I think this is a very valuable principle in general, it’s difficult to consistently uphold in Nixpkgs. For instance, major updates to core libraries often require dozens to hundreds of downstream packages to be adapted for them. Sometimes those commits are in the same PR and sometimes they’re spread out over several in the course of a staging-next
cycle, but squashing them all into a single commit is rarely practical and usually makes review worse. (For this reason, you usually want to bisect Nixpkgs with --first-parent
when you can, and bisecting staging
is… interesting.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think this is a very valuable principle in general, it’s difficult to consistently uphold in Nixpkgs.
Could we still strongly suggest this approach? Aka "commits should usually stand alone, except when doing big treewides" or something like that?
That's because in the majority of cases it is a really good approach to take.
thank you for this I think it was very much needed :) |
There is frequent turmoil among reviewers as to how to best organise the commit history. This improves on that by clarifying the guidelines.
160a36a
to
90a5b88
Compare
|
||
- Check for unnecessary whitespace with `git diff --check` before committing. | ||
For example: | ||
- For example, when adding yourself as maintainer in the same pull request, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated "For example":
- For example, when adding yourself as maintainer in the same pull request, | |
- When adding yourself as maintainer in the same pull request, |
|
||
- If you have commits `pkg-name: oh, forgot to insert whitespace`: squash commits in this case. Use `git rebase -i`. | ||
See [Squashing Commits](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#_squashing) for additional information. | ||
- Later commits that undo changes from earlier commits should be combined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one exception would be commits that are meant to be reverted later (example). Not sure if it makes sense to mention that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Later commits that undo changes from earlier commits should be combined. | |
- Later commits that undo changes from earlier commits should usually be combined. |
I think adding "usually" implies that this is a rule-of-thumb rather than an actual rule. That should be enough for those kinda scenarios?
Maybe this is closer to the actual intent though:
- Later commits that undo changes from earlier commits should be combined. | |
- Later commits that fix issues from earlier commits in the same PR should be combined. |
?
|
||
> [!Note] | ||
> Reviewers should _not_ ask commit authors to make changes to the commit history unless it _clearly_ violates a written guideline. | ||
> Notably there is a wide range of acceptable options for the commit history, ranging from many small independent commits to larger commits that are still independent and understandable by themselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with "wide range of acceptable options". Of course, it's not 100% clear-cut, small variation on how to split things up is perfectly acceptable. But "wide range" just implies it's OK to throw together many possibly unrelated changes into a bigger commit. The problem is: Those changes are always "understandable" for the contributor - because they literally just wrote them.
IMHO, we have a reviewer bottleneck, not a contribution bottleneck. So we absolutely must focus on making things easier to review. And smaller commits, building on top of each other, are much easier to review. So even if a contributor has more work by creating 10 commits instead of 1 commit, this will lead to a faster merge, because it's so much easier to review.
The example given by @emilazy in the other thread (https://github.com/NixOS/nixpkgs/pull/339702/commits) is perfect. I would certainly expect a contributor to split things up like this, because: (1) The PR is almost impossible to review by looking at the whole diff at once, (2) the PR is dead simple to review by going commit-by-commit.
This note as-is will probably lead to fewer comments along the lines of "please split things up" from non-committer reviewers. This is actively bad, because once a committer looks at an already approved PR it should ideally be in a state that is really simple to go through and just confirm quickly that all is good.
Instead of limiting what reviewers are allowed to do, we should instead focus on writing down a guideline for contributors on "How to make my PR reviewable?". And part of that should be "smallest possible independent/standalone/understandable commits". It's much better to err on the side of small commits, than the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would certainly expect a contributor to split things up like this
Well, not to focus on my own PR too much :), but I think that you could easily merge all of these without much of a detriment to reviewability:
- jujutsu: remove
with lib
- jujutsu: remove
rec
- jujutsu: use
env
- jujutsu: remove unused
fetchpatch2
argument - jujutsu: reorder arguments
I prefer the granularity I went with, but I’d struggle to complain about a PR containing a “jujutsu: clean up” commit that did all of these at once. I also think it’s basically okay to do minor fixes or clean‐ups in a version bump commit, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I think that you could easily merge all of these without much of a detriment to reviewability:
Yes, that's exactly the "narrow range" / variation that I have in mind, too. Maybe with the exception of reorder arguments, because that's annoying to check by hand together with a removal of an unused argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think it’s basically okay to do minor fixes or clean‐ups in a version bump commit, for instance.
That's something that I really dislike. I'd really like to understand whether a version bump forces us to make any other changes to keep the package building - and anything else in there, just blurries the picture. This could easily be two commits, cleanup + bump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we have a reviewer bottleneck, not a contribution bottleneck.
Contributors should ideally split up commits to make reviewers' lives easier, yes. However, reviewers should not feel like they have to spend their time being picky about commit squashing, which seems like the point of this paragraph to me – not just empowering contributors to push back if they're needlessly told to combine their commits differently, but also empowering reviewers to not care about it.
This note as-is will probably lead to fewer comments along the lines of "please split things up" from non-committer reviewers. This is actively bad,
You are saying that reviewers have no time, but them complaining about reviewers being told to spend less time telling contributors to resquash their commits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I have concerns about a decline of review quality in Nixpkgs, I don’t think administrative nitpicks like formatting, inconsequential idioms, and borderline commit‐splitting cases are something we have too little of right now. I think there are many cases where it is sensible to ask people to address these things but a culture of reviews that consist entirely of them, especially in edge cases or where the suggestions actively make things worse in some ways, diminishes the value of our code review, and I think it impacts the contributor‐to‐reviewer‐to‐committer pipeline for people capable of doing substantive deep work. The best way to make the most of reviewer time would be to automate more of these nitpicks. For commit splitting that’s hard, admittedly, but I have seen too many good contributions get dropped because of requests that would truly have made no difference in the long run.
I think we should encourage and document good commit discipline but cases like #398764 – where it was in fact requested to squash unrelated changes into a bump – show that we have a problem, both of driving away contributors and of leaving the kinds of unhelpful reviews that new reviewers take as an example to follow. Written bounds on what’s reasonable – and therefore clearly okay to bring up in reviews outside of those bounds – are good, but I do not think we should do things that encourage reviewers to take issue with things that are clearly within those bounds, and if we want to raise the baseline of what we consider good commits then we need better tooling and practices around it and to ensure we are cultivating new contributors with discernment and expertise so that we can incrementally improve the state of things in a way that scales. Having reviewers spend their time on making sure that a simple rearrangement never shares a commit with a version bump is not something we can afford at present, and is an unpleasant experience to be on the receiving end of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note as-is will probably lead to fewer comments along the lines of "please split things up" from non-committer reviewers. This is actively bad,
You are saying that reviewers have no time, but them complaining about reviewers being told to spend less time telling contributors to resquash their commits?
No. I am concerned about fewer "please split" comments, not about fewer "please squash" comments.
Yes, we have not enough reviewers. And we have even fewer committers!
What's my goal as a non-committer-reviewer? To help the contributor bring the PR into a shape that is so easy and quick to understand, that a committer can merge it with ease. As such, comments about commit structure should be OK in general. If we have a shared goal of making the job for the next person to look at the PR easier, then this shouldn't be a problem.
The problem in #398764 was not, that "commit structure" was discussed. The problem was a lack of guidelines to which reviewers can point, thus leading to conflicting messages - that was what lead to the frustration.
Right now, we have no clear guidelines, so some reviewers says "please split" and some say "please squash".
With the proposal as currently written, we will have fewer of both - which is not really helping the case. Instead of focusing on what reviewers are allowed to comment on or not, we should focus on what the PR should look like in the end. If there's a clear picture of that, we will not have conflicting messages and thus no problem.
Having reviewers spend their time on making sure that a simple rearrangement never shares a commit with a version bump is not something we can afford at present, and is an unpleasant experience to be on the receiving end of.
"cleanup" + "version bump" are just entirely unrelated. In fact the only thing they have in common is the point of time they happen at. But "doing things at the same time" is what a Pull Request of multiple commits is for. A commit is never the right "unit" to do "multiple things together". If we make this a clear rule, then it's easy to follow and discussions are not needed, eventually.
Again, this is about increasing the chances of getting something committed. I'm sure a contributor will understand consistent feedback about this, with the reasoning to make the job of a committer easier and thus leading to a faster merge. If the experience to be on the receiving end of that is an unpleasant experience, then the communication fell apart somewhere else, i.e. in the "how" this was communicated.
TBH, commit discipline is terrible on the bigger scale of nixpkgs. There's so much information that could go into commit messages, that just never makes it in there. But if we don't start splitting commits up properly, we don't even have the right place to put this information. By splitting things up, you have to come up with a commit title at least, which often already gives you some information. But it also gives you a base to write better commit messages, especially when you suddenly need to start thinking about that commit as a unit more. Compare that to the commit with version bump and cleanup together - chances are much lower that somebody is going to write a proper commit message for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of focusing on what reviewers are allowed to comment on or not, we should focus on what the PR should look like in the end. If there's a clear picture of that, we will not have conflicting messages and thus no problem.
I totally agree with this.
commit discipline is terrible on the bigger scale of nixpkgs. There's so much information that could go into commit messages, that just never makes it in there
This could also be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing that bigger commits are bad for is backport-ability. Random example: #397226. This has a few refactors and a fix for builds on darwin in the same single commit. The refactors are blocked on backporting, because of missing finalAttrs
support on stable. And thus, the darwin fixes are as well.
If the (probably tiny) darwin fixes were separated, backporting only that commit would be dead simple.
And that's potentially the same for every version bump with unrelated changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents. I strongly endorse this PR, and I am too in favor of most of the claims raised in the "wide range of acceptable options" review thread, though I haven't read it thoroughly.
commit with the message `maintainers: add <handle>`. | ||
Add the commit before those making changes to the package or module. | ||
See [Nixpkgs Maintainers](./maintainers/README.md) for details. | ||
- Check for unnecessary whitespace with `git diff --check` before committing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true that nixfmt
CI job enforces this? If so, maybe it'd be worth mentioning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think whitespace as such is now also checked by editorconfig-checker, which runs via treefmt in both CI and locally.
We should probably replace this with a more generic "run treefmt" (if that's not already documented somewhere, but I bet it is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think whitespace as such is now also checked by editorconfig-checker, which runs via treefmt in both CI and locally.
We should probably replace this with a more generic "run treefmt" (if that's not already documented somewhere, but I bet it is).
I slightly object to squashing all of the editorconfig settings etc in Nixpkgs to such a shallow 'use treefmt' instruction. I think that at least the following must be mentioned in this sentence: (1) The CI job using nixfmt
, (2) the CI job using editorconfig (do these CI jobs overlap?) and (3) the treefmt settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) The CI job using nixfmt, (2) the CI job using editorconfig (do these CI jobs overlap?)
It's one CI job that does both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) The CI job using nixfmt, (2) the CI job using editorconfig (do these CI jobs overlap?)
It's one CI job that does both.
Thanks for the info. If so, I'd enumerate these as follows: (1) The editorconfig settings, (2) CI job that runs nixfmt and editorconfig, and (3) the treefmt settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why those are relevant. We changed it, so that treefmt
is the only entrypoint to all of the formatting needs, for both CI and locally. So users should really not need to know about the implementation details of that, aka that nixfmt or editorconfig-checker are run. After all, there is also keep-sorted and actionlint in that, too. And possibly more in the future.
Note that there is a whole formatting section in the same file, which does already talk about nixfmt and editorconfig separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why those are relevant. We changed it, so that
treefmt
is the only entrypoint to all of the formatting needs, for both CI and locally. So users should really not need to know about the implementation details of that, aka that nixfmt or editorconfig-checker are run. After all, there is also keep-sorted and actionlint in that, too. And possibly more in the future.Note that there is a whole formatting section in the same file, which does already talk about nixfmt and editorconfig separately.
Hmm it's just that I personally use only editorconfig and I run nixfmt manually without treefmt, so I thought it'd be correct to delve into the details of it a bit in this sentence. However indeed if the details of this are explained thoroughly somewhere else, it might be wiser to reference to that section instead of mentioning only git diff --check
which is for sure not enough to adhere to the complete list of formatting rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we just drop the reference to any formatting here entirely. It's something that CI will fail on anyway, it's described elsewhere - so ultimately it just distracts from the point this section tries to make (aka commit conventions). Seems like this only ended up in here, because it's a git
command.
Another reason to split commits: if a file is renamed and changes are made, sometimes it no longer gets considered a rename. So typically, if I'm refactoring a package and moving it to by-name, I have one commit that moves the package to by-name and only minor supporting changes if needed, and then the rest of the refactor in separate commits. Small commits also help in the case that you make a minor change (like enabling an option) that turns out to have been disabled for a good reason. Then, you can just For an example of small commits coming in handy due to both cases, see #406106 |
Also: this should link to #347586 |
I am new to nixpkgs, but I do think I have something to say that has not been said here (I read everything I think How about explicitly giving (more) examples (like is done in this thread)? I.e.: When you, for example, are intending to submit a PR to a certain package that:
- Bumps the version
- Fixes something with the Darwin build
- Fixes something with the Linux build
- Refactors something
- Adds you as a maintainer
Each of these points should count as separate changes, so _you should create_ **five** _commits here_. Or even: "As a general rule of thumb, commit as often as possible without breaking things." I was intrigued by @emilazy's remark (#339702, mentioned earlier here as well) that:
Just a thought: could we include a recommendation for the And finally, TBH, I have been squashing a lot here in PR's, I thought people found it cleaner that way. My bad. I'll try to commit more often :) |
Nitpick with your suggested example. You should mention that it fixes something with the linux build specifically or the Darwin build specifically because it's entirely plausible that there's one change that fixes both. I think the rule should be that each logical unit should be its own commit. For example, a PR that adds a package and fixes its own broken build should absolutely be squashed. Likewise, if it adds a package and then formats it, it should be squashed. However, if there is a PR open that formats the package and fixes its build, each of those two commits should be separate. |
Yes, but I think the problem has been that "create a commit for each logical unit" has been to vague. All contributors should've read that phrase, yet there has never been an (albeit unwritten up to now) consensus about what that might actually mean in practice. Hence all of the contradicting "can you split" vs "can you squash" reviews. I for one read "logical unit" as "a single package, module, test etc.", hence I started squashing all changes that related to the same package into one. But one might also read it as "well I added a single entry to That's why I think there should be more examples. Not that |
Bump; there are merge conflicts. |
There is frequent turmoil among reviewers as to how to best organise the commit history, most recently #398764.
This improves on that by clarifying the guidelines.
Relates to #387072
The new guidelines
Note
Reviewers should not ask commit authors to make changes to the commit history unless it clearly violates a written guideline.
Notably there is a wide range of acceptable options for the commit history, ranging from many small independent commits to larger commits that are still independent and understandable by themselves.
This work is funded by Antithesis and Tweag ✨
Add a 👍 reaction to pull requests you find important.