-
Notifications
You must be signed in to change notification settings - Fork 37.7k
RFC: build: support for pre-compiled headers. #31053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31053. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Also @willcl-ark, you might be interested in this as you've been benchmarking build times. |
Concept ACK |
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
MSAN failure was spurious, but not the tidy (https://cirrus-ci.com/task/4930868089716736): [21:49:26.130] clang-tidy-18 -p=/ci_container_base/ci/scratch/build-x86_64-pc-linux-gnu -quiet -load=/tidy-build/libbitcoin-tidy.so /ci_container_base/src/validationinterface.cpp
[21:49:26.130] /ci_container_base/src/validationinterface.cpp:22:13: error: redundant 'RemovalReasonToString' declaration [readability-redundant-declaration,-warnings-as-errors]
[21:49:26.130] 22 | std::string RemovalReasonToString(const MemPoolRemovalReason& r) noexcept;
[21:49:26.130] | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[21:49:26.130] /ci_container_base/src/kernel/mempool_removal_reason.h:22:13: note: previously declared here
[21:49:26.130] 22 | std::string RemovalReasonToString(const MemPoolRemovalReason& r) noexcept;
[21:49:26.130] | ^
[21:49:26.130] ^^^ ⚠️ Failure generated from clang-tidy
[21:49:26.130] + echo '^^^ ⚠️ Failure generated from clang-tidy'
[21:49:26.130] + false |
Huh, I guess that happens because pch shuffles around the include order. The redundant decl indeed seems broken to me. Will PR a quick fix. |
Combining with #30911 produces even more of a speedup (with Make, ninja is about the same). Config: Master: 2m7.311s Result: Completes in 62% of the time compared to master. |
…-declaring RemovalReasonToString ca2e4ba refactor: include the proper header rather than forward-declaring RemovalReasonToString (Cory Fields) Pull request description: Trivial no-op fixup. This was pointed out by #31053, which causes the include order to be shuffled around: ``` [21:49:26.130] /ci_container_base/src/validationinterface.cpp:22:13: error: redundant 'RemovalReasonToString' declaration [readability-redundant-declaration,-warnings-as-errors] [21:49:26.130] 22 | std::string RemovalReasonToString(const MemPoolRemovalReason& r) noexcept; [21:49:26.130] | ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [21:49:26.130] /ci_container_base/src/kernel/mempool_removal_reason.h:22:13: note: previously declared here [21:49:26.130] 22 | std::string RemovalReasonToString(const MemPoolRemovalReason& r) noexcept; [21:49:26.130] | ^ ``` I don't see any reason why the include shouldn't just be used. ACKs for top commit: maflcko: lgtm ACK ca2e4ba hebasto: ACK ca2e4ba, IWYU seems [agree](https://cirrus-ci.com/task/6170839912022016): TheCharlatan: ACK ca2e4ba Tree-SHA512: e3584cae4f50bf2bc6c824bfaddfe683ef6a17d16138d0cbcc544b98bd64d5d7353b0826b1e8cf16e12410e27b0fcedde27100d4241b7cc194cd4465c8175a5b
Why are ninja builds not affected? |
These are all kernel headers which required the most amount of frontend parsing as measured by clang's -ftime-trace option.
ccache supports setting key=value options as of v4.8. Prior to that, setting sloppiness would've required substantial hacking. For 4.8+: - Set the required sloppiness options - Set "-Xclang -fno-pch-timestamp" for clang Both options are documented here: https://ccache.dev/manual/latest.html
PCH is incompabible with iwyu: error: include-what-you-use does not support PCH It seems likely that it'd mess with clang-tidy as well.
Concept ACK. It's nice that CMAKE provides a built-in abstraction for this so there's no need for really ugly stuff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
Tested and benchmarked on Ubuntu 23.10 with CMake 3.27.4 using the following command:
$ rm -rf build && cmake -B build -G Ninja --preset dev-mode -DWITH_MULTIPROCESS=OFF -DWITH_CCACHE=OFF && time cmake --build build -j 16
- The master branch @ 5fb9455:
real 6m46.350s
- this PR @ 3cec350:
real 6m10.958s
Do you want to keep the base_precompiled
target or use the PRECOMPILE_HEADERS
property on per-target basis?
else() | ||
set(pch_status ON) | ||
endif() | ||
message("Precompiled-headers ................... ${pch_status}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
message("Precompiled-headers ................... ${pch_status}") | |
message("Precompiled headers ................... ${pch_status}") |
Concept ACK
Master at ffe4261 This PR at 3cec350 With ccache 4.10:
Master at ffe4261 This PR at 3cec350 All running on a MacBook Pro M3 Pro 18GB with MacOS Sonoma 14.6.1. |
Complete build on RPi5 with NVME hat, 8GB memory:
(PR with pch 3cec350) (base commit 5fb9455) About 10% faster (in real time), not as impressive as some results on faster machines, but still nice to have. (did the same bechmark on SD card instead of NVME, the difference is not significant, CPU really dominates instead of I/O here)
well-ccache speeds up repeated builds; it makes sense that if it cannot cache anything it's slightly slower, as it has to store all the input and outputs on disk |
@@ -75,6 +75,76 @@ add_subdirectory(secp256k1) | |||
set(CMAKE_EXPORT_COMPILE_COMMANDS ON) | |||
string(APPEND CMAKE_C_COMPILE_OBJECT " ${APPEND_CPPFLAGS} ${APPEND_CFLAGS}") | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think this is a reasonable approach because it pretty much just works with no fiddling. The main drawback is that over time the headers list will grow stale, so it'll require some attention now and then.
Not sure if here, but it'd seem useful to document some (minified) version of the PR description in regards to maintaining this going forward. If anything to prevent if from being too arbtrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. To be clear, I really don't like the arbitrary nature of this at all. I was hoping someone would have a better idea for an approach to selection.
I really meant this as an RFC because I agree that what's here is a pretty crappy solution. But it's clear that pch is a win, so I think we should come up with something.
🐙 This pull request conflicts with the target branch and needs rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK using pre-compiled headers to speed up compilation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a first pass over this. It seems to work for me locally, both with and without ccache
.
I see both clang and gcc running faster with this change than merge-base @ 5fb9455 (with clang being much faster than gcc in both cases):
Summary
env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8) ran
1.32 ± 0.02 times faster than env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 5fb94550638)
1.43 ± 0.00 times faster than env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8)
1.53 ± 0.03 times faster than env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 5fb94550638)
Command | Mean [s] | Min [s] | Max [s] | Relative |
---|---|---|---|---|
env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 5fb94550638) |
178.672 ± 2.623 | 176.287 | 181.481 | 1.32 ± 0.02 |
env CCACHE_DISABLE=1 CC=clang CXX=clang++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8) |
134.888 ± 0.242 | 134.672 | 135.150 | 1.00 |
env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 5fb94550638) |
206.191 ± 4.083 | 203.754 | 210.905 | 1.53 ± 0.03 |
env CCACHE_DISABLE=1 CC=gcc CXX=g++ just build (commit = 3cec3508a79c08db5b97e93bc4225ccdd7d32ad8) |
193.261 ± 0.185 | 193.056 | 193.416 | 1.43 ± 0.00 |
I too can't think of a cleaner way of implementing this, or making it more automated than using a manual list.
Left one comment about adding flags for gcc
) | ||
if(NOT CCACHE_TEST_RESULT) | ||
list(APPEND CMAKE_CXX_COMPILER_LAUNCHER sloppiness=pch_defines,time_macros) | ||
# According to the ccache docs clang requires this option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They also give GCC flags. Perhaps here we can therefore use something like:
diff --git a/cmake/ccache.cmake b/cmake/ccache.cmake
index 08f078e627b..cb7059884fa 100644
--- a/cmake/ccache.cmake
+++ b/cmake/ccache.cmake
@@ -35,7 +35,12 @@ if(NOT MSVC)
if(NOT CCACHE_TEST_RESULT)
list(APPEND CMAKE_CXX_COMPILER_LAUNCHER sloppiness=pch_defines,time_macros)
# According to the ccache docs clang requires this option.
- try_append_cxx_flags("-Xclang -fno-pch-timestamp" TARGET core_interface SKIP_LINK)
+ if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
+ try_append_cxx_flags("-Xclang -fno-pch-timestamp" TARGET core_interface SKIP_LINK)
+ elseif(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
+ try_append_cxx_flags("-fpch-preprocess" TARGET core_interface SKIP_LINK)
+ endif()
else()
list(APPEND configure_warnings "For pre-compiled headers support with ccache, version 4.8+ is required. Disabling pch.")
set(CMAKE_DISABLE_PRECOMPILE_HEADERS ON)
⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
Closing for now. Could re-open if you want to circle back to looking at this. |
This cuts off roughly a third of my total build time, either with
-j8
or-j24
.There are many different ways to go about adding pre-compiled header support with CMake. This is probably the simplest and most naive, but it has a substantial impact.
I used ClangBuildAnalyzer to measure the headers which took the most amount of time to parse while building the kernel library. I did this under the assumption that those are probably the most included files in general, and because they should never be affected by changing defines/flags in different compilation units.
CMake re-generates a new pch file for each target that includes it, so it's only worth using for libraries/binaries which compile a substantial list of cpp files. I eyeballed the libs/bins which I thought made sense.
I chose to add
base_precompiled
asPRIVATE
everywhere so that it's clear exactly where they're being used.Other approaches would involve keeping (or generating) lists of proper headers for each lib/bin, and potentially sharing them around when possible. The approach here (a monolithic interface lib) is unfortunately incompatible with sharing.
There's an additional (hackish) commit to support ccache + pch. Unfortunately this requires a rather new (4.8+) ccache in order to properly set the variables we need. With the required options turned on as documented by ccache, this combo works fine for me locally.
Overall, I think this is a reasonable approach because it pretty much just works with no fiddling. The main drawback is that over time the headers list will grow stale, so it'll require some attention now and then.
Results: (
-DBUILD_UTIL_CHAINSTATE=ON -DBUILD_FUZZ_BINARY=ON -DBUILD_BENCH=ON
)Before:
**** Time summary:
Compilation (1412 times):
Parsing (frontend): 2619.3 s
Codegen & opts (backend): 1553.3 s
Real make time: 6m20.894s
After:
**** Time summary:
Compilation (1422 times):
Parsing (frontend): 1375.4 s
Codegen & opts (backend): 1726.7 s
Real make time: 4m24.634s
pch + ccache 2nd run: 0m10.073s