Skip to content

Conversation

dongcarl
Copy link
Contributor

@dongcarl dongcarl commented Mar 19, 2021

Based on: #21375

When running guix builds, I've found that a large portion of the time was being allocated to gzip. Therefore I experimented with replacing gzip with pigz, which makes use of multiple cores. Here are the wall-clock durations for building all architectures on my workstation (AMD Ryzen Threadripper 2970WX 24-Core Processor):

no depends cache depends cache
gzip 93.19 mins 44.75 mins
pigz 75.76 mins 27.37 mins
change -17.43 mins (-19%) -17.38 mins (-39%)

However, I do understand if people don't think the gains are worth increasing our manifest size. Please let me know what you think!

$ find guix-build-$(git rev-parse --short=12 HEAD)/output/ -type f -print0 | env LC_ALL=C sort -z | xargs -r0 sha256sum
c5c26a1f9048ffe786d956cc4fc7ec3347a6084d0f91cc83bb40a775c57ad0dc  guix-build-ed8cfdd531de/output/aarch64-linux-gnu/bitcoin-ed8cfdd531de-aarch64-linux-gnu-debug.tar.gz
8cbc109723f8877e98b7d453ffc3b41d7a443105a1e3325c59cb3d3ef24ae504  guix-build-ed8cfdd531de/output/aarch64-linux-gnu/bitcoin-ed8cfdd531de-aarch64-linux-gnu.tar.gz
54d49f2aabb365aade70f7097d903fc3879405170d729c26de385ec1efde3d26  guix-build-ed8cfdd531de/output/arm-linux-gnueabihf/bitcoin-ed8cfdd531de-arm-linux-gnueabihf-debug.tar.gz
2613ecc288989f2abe0037f2158b321b9991689d4cce0576aebc2eb4ad2eb0aa  guix-build-ed8cfdd531de/output/arm-linux-gnueabihf/bitcoin-ed8cfdd531de-arm-linux-gnueabihf.tar.gz
5711f8bfbcc8c60b956bd9a74891a84972ef9568467fb04021c8315d5cb7372a  guix-build-ed8cfdd531de/output/dist-archive/bitcoin-ed8cfdd531de.tar.gz
ac4c96ec8b17e08e021628d7311966be4df198360e6564a3441fb22d1590b0fb  guix-build-ed8cfdd531de/output/powerpc64-linux-gnu/bitcoin-ed8cfdd531de-powerpc64-linux-gnu-debug.tar.gz
a84a9fe20e3f2c1ddff94794db2f5981965f48c37e5d0288ef5a6c904b4ef991  guix-build-ed8cfdd531de/output/powerpc64-linux-gnu/bitcoin-ed8cfdd531de-powerpc64-linux-gnu.tar.gz
12af621c5c081ffbe0722f24a4616e22cca0623632990231f0d4016e5821a365  guix-build-ed8cfdd531de/output/powerpc64le-linux-gnu/bitcoin-ed8cfdd531de-powerpc64le-linux-gnu-debug.tar.gz
623f9a6055cffae19b1b352e967fb94b6fd0d9d1e4bb3cac555997308256cdf1  guix-build-ed8cfdd531de/output/powerpc64le-linux-gnu/bitcoin-ed8cfdd531de-powerpc64le-linux-gnu.tar.gz
c47ebaa781fad44ff249b61a4d7b05214b838580546b3c11de971c639835114b  guix-build-ed8cfdd531de/output/riscv64-linux-gnu/bitcoin-ed8cfdd531de-riscv64-linux-gnu-debug.tar.gz
79717355815f39e291e5fe3ca5ba1390240344a4b9b2d12325fba9cdab81a735  guix-build-ed8cfdd531de/output/riscv64-linux-gnu/bitcoin-ed8cfdd531de-riscv64-linux-gnu.tar.gz
6db76e37fcab591c13ebd7fe2527c62415f5ca16c91a8d395c49d36d84f22211  guix-build-ed8cfdd531de/output/x86_64-apple-darwin18/bitcoin-ed8cfdd531de-osx-unsigned.dmg
fd79d768e2478e5ab6a0534fad3c8ff033efe34dd26d98040fa8ace7c642073c  guix-build-ed8cfdd531de/output/x86_64-apple-darwin18/bitcoin-ed8cfdd531de-osx-unsigned.tar.gz
96239efb584677390a9d95c08b1ec5b29d01c40619f252eb7d2fb7ca3f6e2571  guix-build-ed8cfdd531de/output/x86_64-apple-darwin18/bitcoin-ed8cfdd531de-osx64.tar.gz
d5c5017202f69a74e0c172295de2e08beb007c32acd7f0e6f6213a6876bf5960  guix-build-ed8cfdd531de/output/x86_64-linux-gnu/bitcoin-ed8cfdd531de-x86_64-linux-gnu-debug.tar.gz
ff5de879eec3f3a6cdb25e50f77a105b82224bded2212411f204b414d87571f5  guix-build-ed8cfdd531de/output/x86_64-linux-gnu/bitcoin-ed8cfdd531de-x86_64-linux-gnu.tar.gz
7ab80db27c3234c09804c39c72476b6a3ea92973d783b42031ac28911c914ac2  guix-build-ed8cfdd531de/output/x86_64-w64-mingw32/bitcoin-ed8cfdd531de-win-unsigned.tar.gz
d351fc91e7b30ab73a6debf3f6fb46a1087e4e18c48b504ff56a01f37e74151f  guix-build-ed8cfdd531de/output/x86_64-w64-mingw32/bitcoin-ed8cfdd531de-win64-debug.zip
b5bdd32eb5391e6cbb25b9ef485f386263a41651bbe7b3d48910d586e8078910  guix-build-ed8cfdd531de/output/x86_64-w64-mingw32/bitcoin-ed8cfdd531de-win64-setup-unsigned.exe
23ea865a9916795bb3925bb9f29848e6d3b3e769d42acb99bd33880c8a0e93dc  guix-build-ed8cfdd531de/output/x86_64-w64-mingw32/bitcoin-ed8cfdd531de-win64.zip

dongcarl added 15 commits March 17, 2021 15:02
In Guix, there are two flags for controlling parallelism:

Note: When I say "derivation," think "package"

--cores=n
  - controls the number of CPU cores to build each derivation. This is
    the value passed to `make`'s `--jobs=` flag.
  - defaults to 0: as many cores as is available

--max-jobs=n
  - controls how many derivations can be built in parallel
  - defaults to 1

Therefore, if set --max-jobs=$MAX_JOBS and don't set --cores, Guix could
theoretically spin up $MAX_JOBS * $(nproc) number of threads, and that's
no good.

So we could either default to --cores=1, --max-jobs=$MAX_JOBS

  - Pro: --cores=1 means that `make` will be invoked with `-j1`,
         avoiding problems with package whose build systems and test
         suites break when running multi-threaded.

  - Con: There will be times when only 1 or 2 derivations can be built
         at a time, because the rest of the dependency graph all depend
         on those 1 or 2 derivations. During these times, the machine
         will be severely under-utilized.

or --cores=$MAX_JOBS, --max-jobs=1

  - Pro: We don't encounter prolonged periods of
         severe under-utilization mentioned above.

  - Con: Many packages' build systems and test suites break when running
         multi-threaded.

or --cores=1, --max-jobs=1 and let the user override with
$ADDITIONAL_GUIX_COMMON_FLAGS
Otherwise, it prints a rather disturbing message to stderr:

    fatal: no tag exactly matches '<hash>'
./windeploy is a "working directory", and therefore belongs inside
distsrc-*. Many people have noticed their Guix builds failing after
hours simply because they did not remove windeploy (but did remove the
distsrc-* directories).
@luke-jr
Copy link
Member

luke-jr commented Mar 19, 2021

Considering the huge performance impact of using Guix at all, this doesn't make sense to me.

@dongcarl
Copy link
Contributor Author

dongcarl commented Mar 19, 2021

Considering the huge performance impact of using Guix at all, this doesn't make sense to me.

Sorry, I'm not fully understanding, are you saying: "Running Guix builds already strains my machine quite a bit, it wouldn't make sense to strain it more by using pigz"?

@luke-jr
Copy link
Member

luke-jr commented Mar 19, 2021

No, I'm just saying gzip's performance hit is trivial compared to migrating from gitian to guix.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 19, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@sipa
Copy link
Member

sipa commented Mar 20, 2021

Are there any concerns about pigz possible introducing non-determinism, if it compresses in parallel? (looking at the documentation, my guess is no, but it's worth testing).

Otherwise, that's a pretty nice and simple speedup. In fact, I'm kind of surprised we spend that much time on gzipping stuff that this even matters...

@laanwj
Copy link
Member

laanwj commented Mar 20, 2021

Otherwise, that's a pretty nice and simple speedup. In fact, I'm kind of surprised we spend that much time on gzipping stuff that this even matters...

I came here to say this, basically. That, seeing how much time is spent on compilation, zlib packing takes that much time. It's not something like xz that is comparatively a lot CPU heavier for compression.

I'm also a little bit concerned about adding a non-standard dependency. Not enough to NACK this though.

What about lowering the compression level for intermediate gz's like caches. Depending on I/O speed versus CPU speed this can help, at the expense of somewhat more disk usage. [So to be clear I'm not suggesting less compression for the distribution.]

A more high-level question would be: Are we at the point that we need to micro-optimize GUIX build speed, or is this something we can consider later? 🙂
(thanks for looking into it though! it's very useful to know where time is spent)

@dongcarl
Copy link
Contributor Author

Are there any concerns about pigz possible introducing non-determinism, if it compresses in parallel? (looking at the documentation, my guess is no, but it's worth testing).

I've asked this question here: madler/pigz#90, and tested locally. It seems to be deterministic!

What about lowering the compression level for intermediate gz's like caches. Depending on I/O speed versus CPU speed this can help, at the expense of somewhat more disk usage. [So to be clear I'm not suggesting less compression for the distribution.]

Probably worthwhile to experiment with that too!

A more high-level question would be: Are we at the point that we need to micro-optimize GUIX build speed, or is this something we can consider later? 🙂
(thanks for looking into it though! it's very useful to know where time is spent)

Oh I think we can 100% consider this later, I've just done so many builds where I thought something was stuck for 3+ mins only to find out it's gzip being slow :-) There's no rush, that's why it's just a Draft PR. I just wanted to write down my crude findings somewhere public!

@sipa
Copy link
Member

sipa commented Mar 20, 2021

More fine-tuning / microoptimization: for purposes of just internally compressing things, zstd is a lot faster than gzip (at the same compression level).

@maflcko
Copy link
Member

maflcko commented Mar 21, 2021

Is the release binary bit-identical with gzip vs pigz?

@sipa
Copy link
Member

sipa commented Mar 21, 2021

Is the release binary bit-identical with gzip vs pigz

No, only between same-version same-settings pigz runs (with possibly different number of threads).

@luke-jr
Copy link
Member

luke-jr commented Mar 21, 2021

What about the same verson/settings on different platforms? (Right now, we only support amd64, but maybe in the future...)

@fanquake
Copy link
Member

fanquake commented Sep 2, 2021

Lets reconsider this in the future. Going to close for now.

@fanquake fanquake closed this Sep 2, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants