Skip to content

Conversation

theuni
Copy link
Member

@theuni theuni commented Oct 7, 2024

This cuts off roughly a third of my total build time, either with -j8 or -j24.

There are many different ways to go about adding pre-compiled header support with CMake. This is probably the simplest and most naive, but it has a substantial impact.

I used ClangBuildAnalyzer to measure the headers which took the most amount of time to parse while building the kernel library. I did this under the assumption that those are probably the most included files in general, and because they should never be affected by changing defines/flags in different compilation units.

CMake re-generates a new pch file for each target that includes it, so it's only worth using for libraries/binaries which compile a substantial list of cpp files. I eyeballed the libs/bins which I thought made sense.

I chose to add base_precompiled as PRIVATE everywhere so that it's clear exactly where they're being used.

Other approaches would involve keeping (or generating) lists of proper headers for each lib/bin, and potentially sharing them around when possible. The approach here (a monolithic interface lib) is unfortunately incompatible with sharing.

There's an additional (hackish) commit to support ccache + pch. Unfortunately this requires a rather new (4.8+) ccache in order to properly set the variables we need. With the required options turned on as documented by ccache, this combo works fine for me locally.

Overall, I think this is a reasonable approach because it pretty much just works with no fiddling. The main drawback is that over time the headers list will grow stale, so it'll require some attention now and then.


Results: (-DBUILD_UTIL_CHAINSTATE=ON -DBUILD_FUZZ_BINARY=ON -DBUILD_BENCH=ON)

Before:
**** Time summary:
Compilation (1412 times):
Parsing (frontend): 2619.3 s
Codegen & opts (backend): 1553.3 s

Real make time: 6m20.894s

After:
**** Time summary:
Compilation (1422 times):
Parsing (frontend): 1375.4 s
Codegen & opts (backend): 1726.7 s

Real make time: 4m24.634s

pch + ccache 2nd run: 0m10.073s

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 7, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31053.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK TheCharlatan, hebasto, laanwj, edilmedeiros, BrandonOdiwuor

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30882 (wip: Split fuzz binary (take 2) by dergoegge)
  • #30861 (build: Improve ccache performance for different build directories by hebasto)
  • #30811 (build: Unify -logsourcelocations format by hebasto)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@theuni
Copy link
Member Author

theuni commented Oct 7, 2024

Ping @hebasto, @fanquake

Also @willcl-ark, you might be interested in this as you've been benchmarking build times.

@TheCharlatan
Copy link
Contributor

Concept ACK

@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 7, 2024

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/31200880897

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@fanquake
Copy link
Member

fanquake commented Oct 8, 2024

MSAN failure was spurious, but not the tidy (https://cirrus-ci.com/task/4930868089716736):

[21:49:26.130] clang-tidy-18 -p=/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu -quiet -load=/tidy-build/libbitcoin-tidy.so /ci_container_base/src/validationinterface.cpp
[21:49:26.130] /ci_container_base/src/validationinterface.cpp:22:13: error: redundant 'RemovalReasonToString' declaration [readability-redundant-declaration,-warnings-as-errors]
[21:49:26.130]    22 | std::string RemovalReasonToString(const MemPoolRemovalReason& r) noexcept;
[21:49:26.130]       | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[21:49:26.130] /ci_container_base/src/kernel/mempool_removal_reason.h:22:13: note: previously declared here
[21:49:26.130]    22 | std::string RemovalReasonToString(const MemPoolRemovalReason& r) noexcept;
[21:49:26.130]       |             ^
[21:49:26.130] ^^^ ⚠️ Failure generated from clang-tidy
[21:49:26.130] + echo '^^^ ⚠️ Failure generated from clang-tidy'
[21:49:26.130] + false

@theuni
Copy link
Member Author

theuni commented Oct 8, 2024

Huh, I guess that happens because pch shuffles around the include order.

The redundant decl indeed seems broken to me. Will PR a quick fix.

@theuni
Copy link
Member Author

theuni commented Oct 8, 2024

Combining with #30911 produces even more of a speedup (with Make, ninja is about the same).

Config: -DCMAKE_BUILD_TYPE=Debug -DWITH_CCACHE=OFF -DBUILD_UTIL_CHAINSTATE=ON -DBUILD_FUZZ_BINARY=ON -DBUILD_BENCH=ON
Using: make -j24

Master: 2m7.311s
This PR: 1m34.495s
This PR + #30911: 1m19.046s

Result: Completes in 62% of the time compared to master.

@DrahtBot DrahtBot removed the CI failed label Oct 8, 2024
fanquake added a commit that referenced this pull request Oct 9, 2024
…-declaring RemovalReasonToString

ca2e4ba refactor: include the proper header rather than forward-declaring RemovalReasonToString (Cory Fields)

Pull request description:

  Trivial no-op fixup.

  This was pointed out by #31053, which causes the include order to be shuffled around:

  ```
  [21:49:26.130] /ci_container_base/src/validationinterface.cpp:22:13: error: redundant 'RemovalReasonToString' declaration [readability-redundant-declaration,-warnings-as-errors]
  [21:49:26.130]    22 | std::string RemovalReasonToString(const MemPoolRemovalReason& r) noexcept;
  [21:49:26.130]       | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  [21:49:26.130] /ci_container_base/src/kernel/mempool_removal_reason.h:22:13: note: previously declared here
  [21:49:26.130]    22 | std::string RemovalReasonToString(const MemPoolRemovalReason& r) noexcept;
  [21:49:26.130]       |             ^
  ```

  I don't see any reason why the include shouldn't just be used.

ACKs for top commit:
  maflcko:
    lgtm ACK ca2e4ba
  hebasto:
    ACK ca2e4ba, IWYU seems [agree](https://cirrus-ci.com/task/6170839912022016):
  TheCharlatan:
    ACK ca2e4ba

Tree-SHA512: e3584cae4f50bf2bc6c824bfaddfe683ef6a17d16138d0cbcc544b98bd64d5d7353b0826b1e8cf16e12410e27b0fcedde27100d4241b7cc194cd4465c8175a5b
@hebasto
Copy link
Member

hebasto commented Oct 9, 2024

Combining with #30911 produces even more of a speedup (with Make, ninja is about the same).

Why are ninja builds not affected?

@theuni
Copy link
Member Author

theuni commented Oct 9, 2024

Combining with #30911 produces even more of a speedup (with Make, ninja is about the same).

Why are ninja builds not affected?

Ninja builds are sped up significantly by pch (this PR), it's the combo with #30911 doesn't seem to have much effect.

theuni added 5 commits October 9, 2024 16:25
These are all kernel headers which required the most amount of frontend
parsing as measured by clang's -ftime-trace option.
ccache supports setting key=value options as of v4.8. Prior to that, setting
sloppiness would've required substantial hacking.

For 4.8+:
  - Set the required sloppiness options
  - Set "-Xclang -fno-pch-timestamp" for clang

Both options are documented here: https://ccache.dev/manual/latest.html
PCH is incompabible with iwyu:
  error: include-what-you-use does not support PCH

It seems likely that it'd mess with clang-tidy as well.
@laanwj
Copy link
Member

laanwj commented Oct 17, 2024

Concept ACK. It's nice that CMAKE provides a built-in abstraction for this so there's no need for really ugly stuff.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

Tested and benchmarked on Ubuntu 23.10 with CMake 3.27.4 using the following command:

$ rm -rf build && cmake -B build -G Ninja --preset dev-mode -DWITH_MULTIPROCESS=OFF -DWITH_CCACHE=OFF && time cmake --build build -j 16
real	6m46.350s
real	6m10.958s

Do you want to keep the base_precompiled target or use the PRECOMPILE_HEADERS property on per-target basis?

else()
set(pch_status ON)
endif()
message("Precompiled-headers ................... ${pch_status}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
message("Precompiled-headers ................... ${pch_status}")
message("Precompiled headers ................... ${pch_status}")

@bitcoin bitcoin deleted a comment from syaifulnizamiphone7 Oct 22, 2024
@edilmedeiros
Copy link
Contributor

edilmedeiros commented Oct 23, 2024

Concept ACK
Code LGTM.


rm -rf build && cmake -B build -DBoost_INCLUDE_DIR=/opt/local/libexec/boost/1.78/include -DCMAKE_BUILD_TYPE=Debug -DWITH_CCACHE=OFF -DBUILD_BENCH=ON -DBUILD_FUZZ_BINARY=ON -DBUILD_UTIL_CHAINSTATE=ON && /usr/bin/time cmake --build build -j 11

Master at ffe4261
1m56,95s real

This PR at 3cec350
1m32,76s real


With ccache 4.10:

rm -rf build && cmake -B build -DBoost_INCLUDE_DIR=/opt/local/libexec/boost/1.78/include -DCMAKE_BUILD_TYPE=Debug -DWITH_CCACHE=ON -DBUILD_BENCH=ON -DBUILD_FUZZ_BINARY=ON -DBUILD_UTIL_CHAINSTATE=ON && /usr/bin/time -h cmake --build build -j 11

Master at ffe4261
2m18,47s real

This PR at 3cec350
1m52,09s real


All running on a MacBook Pro M3 Pro 18GB with MacOS Sonoma 14.6.1.
cmake version 3.29.5.
Interesting that ccache made my build slower.

@laanwj
Copy link
Member

laanwj commented Oct 24, 2024

Complete build on RPi5 with NVME hat, 8GB memory:

$ cmake -B build  -G Ninja --preset=dev-mode  -DWITH_CCACHE=OFF --toolchain /data/tmp/bitcoin/depends/aarch64-unknown-linux-gnu/toolchain.cmake 
$ time ninja -C build

(PR with pch 3cec350)
real 33m33.235s
user 127m24.712s
sys 4m22.753s

(base commit 5fb9455)
real 37m39.927s
user 144m17.888s
sys 3m26.133s

About 10% faster (in real time), not as impressive as some results on faster machines, but still nice to have.

(did the same bechmark on SD card instead of NVME, the difference is not significant, CPU really dominates instead of I/O here)

Interesting that ccache made my build slower.

well-ccache speeds up repeated builds; it makes sense that if it cannot cache anything it's slightly slower, as it has to store all the input and outputs on disk

@@ -75,6 +75,76 @@ add_subdirectory(secp256k1)
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
string(APPEND CMAKE_C_COMPILE_OBJECT " ${APPEND_CPPFLAGS} ${APPEND_CFLAGS}")


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is a reasonable approach because it pretty much just works with no fiddling. The main drawback is that over time the headers list will grow stale, so it'll require some attention now and then.

Not sure if here, but it'd seem useful to document some (minified) version of the PR description in regards to maintaining this going forward. If anything to prevent if from being too arbtrary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. To be clear, I really don't like the arbitrary nature of this at all. I was hoping someone would have a better idea for an approach to selection.

I really meant this as an RFC because I agree that what's here is a pretty crappy solution. But it's clear that pch is a win, so I think we should come up with something.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 6, 2024

🐙 This pull request conflicts with the target branch and needs rebase.

Copy link
Contributor

@BrandonOdiwuor BrandonOdiwuor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK using pre-compiled headers to speed up compilation

Copy link
Member

@willcl-ark willcl-ark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a first pass over this. It seems to work for me locally, both with and without ccache.

I see both clang and gcc running faster with this change than merge-base @ 5fb9455 (with clang being much faster than gcc in both cases):

Summary
  env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8) ran
    1.32 ± 0.02 times faster than env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 5fb94550638)
    1.43 ± 0.00 times faster than env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8)
    1.53 ± 0.03 times faster than env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 5fb94550638)
Command Mean [s] Min [s] Max [s] Relative
env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 5fb94550638) 178.672 ± 2.623 176.287 181.481 1.32 ± 0.02
env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8) 134.888 ± 0.242 134.672 135.150 1.00
env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 5fb94550638) 206.191 ± 4.083 203.754 210.905 1.53 ± 0.03
env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8) 193.261 ± 0.185 193.056 193.416 1.43 ± 0.00

I too can't think of a cleaner way of implementing this, or making it more automated than using a manual list.

Left one comment about adding flags for gcc

)
if(NOT CCACHE_TEST_RESULT)
list(APPEND CMAKE_CXX_COMPILER_LAUNCHER sloppiness=pch_defines,time_macros)
# According to the ccache docs clang requires this option.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They also give GCC flags. Perhaps here we can therefore use something like:

diff --git a/cmake/ccache.cmake b/cmake/ccache.cmake
index 08f078e627b..cb7059884fa 100644
--- a/cmake/ccache.cmake
+++ b/cmake/ccache.cmake
@@ -35,7 +35,12 @@ if(NOT MSVC)
       if(NOT CCACHE_TEST_RESULT)
         list(APPEND CMAKE_CXX_COMPILER_LAUNCHER sloppiness=pch_defines,time_macros)
         # According to the ccache docs clang requires this option.
-        try_append_cxx_flags("-Xclang -fno-pch-timestamp" TARGET core_interface SKIP_LINK)
+        if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
+          try_append_cxx_flags("-Xclang -fno-pch-timestamp" TARGET core_interface SKIP_LINK)
+        elseif(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
+          try_append_cxx_flags("-fpch-preprocess" TARGET core_interface SKIP_LINK)
+        endif()
       else()
         list(APPEND configure_warnings "For pre-compiled headers support with ccache, version 4.8+ is required. Disabling pch.")
         set(CMAKE_DISABLE_PRECOMPILE_HEADERS ON)

@DrahtBot
Copy link
Contributor

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@fanquake fanquake marked this pull request as draft February 25, 2025 14:54
@DrahtBot
Copy link
Contributor

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@fanquake
Copy link
Member

fanquake commented Jul 2, 2025

Closing for now. Could re-open if you want to circle back to looking at this.

@fanquake fanquake closed this Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants