-
Notifications
You must be signed in to change notification settings - Fork 37.7k
guix: Use pigz
as a faster gzip
replacement
#21478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In Guix, there are two flags for controlling parallelism: Note: When I say "derivation," think "package" --cores=n - controls the number of CPU cores to build each derivation. This is the value passed to `make`'s `--jobs=` flag. - defaults to 0: as many cores as is available --max-jobs=n - controls how many derivations can be built in parallel - defaults to 1 Therefore, if set --max-jobs=$MAX_JOBS and don't set --cores, Guix could theoretically spin up $MAX_JOBS * $(nproc) number of threads, and that's no good. So we could either default to --cores=1, --max-jobs=$MAX_JOBS - Pro: --cores=1 means that `make` will be invoked with `-j1`, avoiding problems with package whose build systems and test suites break when running multi-threaded. - Con: There will be times when only 1 or 2 derivations can be built at a time, because the rest of the dependency graph all depend on those 1 or 2 derivations. During these times, the machine will be severely under-utilized. or --cores=$MAX_JOBS, --max-jobs=1 - Pro: We don't encounter prolonged periods of severe under-utilization mentioned above. - Con: Many packages' build systems and test suites break when running multi-threaded. or --cores=1, --max-jobs=1 and let the user override with $ADDITIONAL_GUIX_COMMON_FLAGS
Otherwise, it prints a rather disturbing message to stderr: fatal: no tag exactly matches '<hash>'
./windeploy is a "working directory", and therefore belongs inside distsrc-*. Many people have noticed their Guix builds failing after hours simply because they did not remove windeploy (but did remove the distsrc-* directories).
Considering the huge performance impact of using Guix at all, this doesn't make sense to me. |
Sorry, I'm not fully understanding, are you saying: "Running Guix builds already strains my machine quite a bit, it wouldn't make sense to strain it more by using |
No, I'm just saying gzip's performance hit is trivial compared to migrating from gitian to guix. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Are there any concerns about pigz possible introducing non-determinism, if it compresses in parallel? (looking at the documentation, my guess is no, but it's worth testing). Otherwise, that's a pretty nice and simple speedup. In fact, I'm kind of surprised we spend that much time on gzipping stuff that this even matters... |
I came here to say this, basically. That, seeing how much time is spent on compilation, I'm also a little bit concerned about adding a non-standard dependency. Not enough to NACK this though. What about lowering the compression level for intermediate gz's like caches. Depending on I/O speed versus CPU speed this can help, at the expense of somewhat more disk usage. [So to be clear I'm not suggesting less compression for the distribution.] A more high-level question would be: Are we at the point that we need to micro-optimize GUIX build speed, or is this something we can consider later? 🙂 |
I've asked this question here: madler/pigz#90, and tested locally. It seems to be deterministic!
Probably worthwhile to experiment with that too!
Oh I think we can 100% consider this later, I've just done so many builds where I thought something was stuck for 3+ mins only to find out it's |
More fine-tuning / microoptimization: for purposes of just internally compressing things, zstd is a lot faster than gzip (at the same compression level). |
Is the release binary bit-identical with gzip vs pigz? |
No, only between same-version same-settings pigz runs (with possibly different number of threads). |
What about the same verson/settings on different platforms? (Right now, we only support amd64, but maybe in the future...) |
Lets reconsider this in the future. Going to close for now. |
Based on: #21375
When running guix builds, I've found that a large portion of the time was being allocated to
gzip
. Therefore I experimented with replacinggzip
withpigz
, which makes use of multiple cores. Here are the wall-clock durations for building all architectures on my workstation (AMD Ryzen Threadripper 2970WX 24-Core Processor):However, I do understand if people don't think the gains are worth increasing our manifest size. Please let me know what you think!