-
Notifications
You must be signed in to change notification settings - Fork 37.7k
index: initial sync speedup, parallelize process #26966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
index: initial sync speedup, parallelize process #26966
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/26966. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
drahtbot_id_4_m |
deb2e6e
to
d8b0f5e
Compare
Cool, will take it for a spin... |
d8b0f5e
to
65dc850
Compare
Cool @Sjors, just pushed a small update. Found a little bug. Going to add an important note to the PR description (because otherwise testing results will vary a lot): As the access to the block data on disk is protected by |
65dc850
to
09cc56a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably much easier suggested than done, but did you attempt to implement parallelization in a more general way so that other indices could benefit from it as well? On first glance, txindex
, and the indices suggested in PRs (#24539, #26951) seem to be parallelizable as well (not coinstatsindex
though).
Yeah, that is part of the plan. I started with the block filter index because it requires an special treatment that My idea was to start reviewing this one, so the process gets as clean as possible, and then move forward with the generalization step. It's usually more natural to abstract processes when the specific cases are well-defined. |
09cc56a
to
43e6237
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Perhaps it could have separate parallelization and index logic. So it could be reused in other indexes.
b2b62f5
to
4e1fba1
Compare
Building the index on (AMD Ryzen 7950X, blocks stored on SSD):
I made sure to not load any wallets and disabled other indexes. I didn't test if the index was correct. I wonder if, for users without this index, it would be faster to generate the index, rescan the wallet and then delete it again. Combined with #26951 you would only have to generate filters up to the age of the wallet (IIUC, cc @pstratem). Note to self for future benchmarks: |
Great results @Sjors!. Could also give it a run rebased on top #27006.
The PR contain a test verifying it. Side note: |
1da78b2
to
f264ed5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code review ACK 068537a
Built and tested on macos/arm64 as well as Debian/x86. Left several questions and suggestions. I really like ThreadPool as a util and look forward to using it for http workers as well, to share the code.
I'm currently rebuilding block filter and tx indexes with this branch to compare against master, which after 48 hours was only about 70% complete...
Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
ACK 068537ae61309a61e3bc350dc747d75cf2207517
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmiD4qAACgkQ5+KYS2KJ
yToG8g//d72j0Pq/TmDjOA0nbCrqrBIKLIvHiTD8DIBCvCb2Ul+NnZDfvFAbuR9s
nkArmxEUPRPBSkKBYMqJHUt+JF4zsLyttkryJW1vFCHwLi49an82B2MKSNNyOfs+
PUufS9ro5FNDNax66jVdjD1/CrNRYAt/AQ/K3FSo7FNG5dbpO2n09ZBXWAHqwZfU
3bf7p3Ug0JBlEe7/JMz1Wbu7wDV9E0lINarr/n5dnVQZTBLHaCvabSYtrEix/BRG
ku2MjexrbZdR5PY1xKQvJYkOkndZDkLQVJMC9BT9GeCYBdGRaGXadZQ3AwNlzobz
JVSMQO2Ngv/Ow8IQWrAs705Moqzu680PjdodxTSrj0QU/D4cNJAW390QzwT7evrw
pAVipwM4oomc4ZMfWX4pq8AwlZ+GywEIMa34UbO0pbpGnlIEUPRUBb7l7ODzcCcb
/cp9l9xpXwWH8+1yY9uRUnh9AHRlVsNdTJeCq3kTU8W1CYcmCOaImwjMJU3tM9Aj
vgFzVQam4KYJTD+WjhKkFAb7M0uTF6LqaM5ChfSaTR1zXuVQ7OX0sYM8R0B2QLJ8
uxNzNF/KnTJZYaOc0i/b4JHP3MZWgoO3GcT64woj1GqHGjLj2YOAKYSfpNZfWaL+
kdwRnzC4qgMKunoqtv4sX9Ht0Hk7qyDfPnEog/AT4wnSbiP/HZs=
=VG2A
-----END PGP SIGNATURE-----
pinheadmz's public key is on openpgp.org
counter.fetch_add(1); | ||
}); | ||
} | ||
std::this_thread::sleep_for(std::chrono::milliseconds{100}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this sleep to give tasks a chance to execute if the blocking breaks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this sleep to give tasks a chance to execute if the blocking breaks?
Good eye.
IIRC, I added it to wait until the workers actually get blocked. Otherwise, the thread pool queue size would be greater than the expected value (because it did not consumed the blocking tasks).
Still, we could remove this by adding some ready_promises
, as did in test 2. A bit more code but less fragile.
/// async result batches in a synchronous fashion (if required). | ||
[[nodiscard]] virtual std::any CustomProcessBlock(const interfaces::BlockInfo& block_info) { | ||
// If parallel sync is enabled, the child class must implement this method. | ||
if (AllowParallelSync()) return std::any(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this condition should be a bit more attention-getting, since it should never execute right? Either log something or Assume()
for the benefit of future index developers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this condition should be a bit more attention-getting, since it should never execute right? Either log something or
Assume()
for the benefit of future index developers?
Hmm, or we could remove this line and call CustomAppend
by default. Then it would be up to the child index to decide whether it needs sequential ordering during parallel sync or not. This way, the tx and BIP352 indexes would require fewer lines of code.
068537a
to
9bba5ef
Compare
Did a rough benchmark test. Froze a full node at height
Running this PR (restarting with empty
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 9bba5ef
Changes since last review are minimal responses to my own review suggestions. Built on macos/arm64 and debian/x86. Ran functional and unit tests, ran with -indexworkers=16
on mainnet fullnode with >900000 blocks
Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
ACK 9bba5ef0cc5b6890eba2be3a6ed429c4fea5a28f
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmiI394ACgkQ5+KYS2KJ
yTruCxAAwQ1Q4syFsBPG0fS1KaKSV7LKLl5oJNzr5RPP7w/NliMB3KDXI8lhQnhr
ej6NNs7ovm05O99QrjFomGyg5oYO+rcECPoXp5lrwsOQBWJk10ZNgDIKdh2ChWr0
sn2o8DCsGpHfkXZ9qg8ynStt5ScCv/1bopb2jvaFjWHdy/5LREJ62XuJZoah5O74
kSSWyj1Nxi9oVNKctXifFp0WwqOqofft2kgDWghRPw67SERuZGqcyzb89zKLPLxr
iyMIWd8LY36isVj7XZBNVz1+jQnr77ldR15Uar/CKlrEoNC/tKs/NBE0eXAHrKhs
4iuKdtX470yazbKGAm6Ei2wQbYa4w5319QDa6pVYlqo30ZMlW/AvVMC//Uc1/v3a
fCW2ootvbbzz0lBPXg5yIB8wvaCuDkYYjMGL6tnYg1T4ChlS7P2y7IbfWryAjOKt
UN1kJNkMoXzSrHu2nZaDmIv3hFpQBaDUH0BlT+unUg8hUioE1VqpIw/+rmcCj7eA
IhKJ0NnJZnh92y5Y+9b+lf+ix6TiF0ZqGT/h+5Yv7Vj8YVXP006HpKvpx2Y84ii+
5IZag6OviHoKt056Gn2iixQFSbw6hVMOw8JnF5FC3iZR28roEkJlY2WtOUF6h/Se
UJTC1VMQuxRm+AIYLwC8w1RCRMoZDRKlOKBjOY5beFCIOARef3w=
=Zxbl
-----END PGP SIGNATURE-----
pinheadmz's public key is on openpgp.org
|
||
// Create workers | ||
for (int i = 0; i < num_workers; i++) { | ||
m_workers.emplace_back(&util::TraceThread, m_name + "_threadpool_worker_" + util::ToString(i), [this] { WorkerThread(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bike shedding now so ok to ignore -- but when we set thread names on Linux they are truncated to 15 characters. So I dunno I guess "thread" and "worker" are redundant in the name of a thread?
bitcoin/src/util/threadnames.cpp
Lines 22 to 27 in 2f410ad
//! internal name. | |
static void SetThreadName(const char* name) | |
{ | |
#if defined(PR_SET_NAME) | |
// Only the first 15 characters are used (16 - NUL terminator) | |
::prctl(PR_SET_NAME, name, 0, 0, 0); |
…ations by caching last header 99afb9d refactor: init, simplify index shutdown code (furszy) 0faafb5 index: decrease ThreadSync cs_main contention (furszy) f1469eb index: cache last block filter header (furszy) a6756ec index: blockfilter, decouple header lookup into its own function (furszy) 331f044 index: blockfilter, decouple Write into its own function (furszy) bcbd7eb bench: basic block filter index initial sync (furszy) Pull request description: Work decoupled from bitcoin#26966 per request. The aim is to remove an unnecessary disk read operation that currently takes place with every new arriving block (or scanned block during background sync). Instead of reading the last filter header from disk merely to access its hash for constructing the next filter, this work caches it, occupying just 32 more bytes in memory. Also, reduces `cs_main` lock contention during the index initial sync process. And, simplifies the indexes initialization and shutdown procedure. Testing Note: To compare the changes, added a pretty basic benchmark in the second commit. Alternatively, could also test the changes by timing the block filter sync from scratch on any network; start the node with `-blockfilterindex` and monitor the logs until the syncing process finish. Local Benchmark Results: *Master (c252a0f): | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 132,042,516.60 | 7.57 | 0.3% | 7.79 | `BlockFilterIndexSync` *PR (43a212c): | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 126,915,841.60 | 7.88 | 0.6% | 7.51 | `BlockFilterIndexSync` ACKs for top commit: Sjors: re-ACK 99afb9d achow101: ACK 99afb9d TheCharlatan: Re-ACK 99afb9d andrewtoth: ACK 99afb9d Tree-SHA512: 927daadd68f4ee1ca781a89519539b895f5185a76ebaf525fbc246ea8dcf40d44a82def00ac34b188640802844b312270067f1b33e65a2479e06be9169c616de # Conflicts: # src/index/base.cpp # src/index/base.h # src/index/blockfilterindex.cpp
9bba5ef
to
b349fba
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
647f6f0
to
f6a52c6
Compare
…ations by caching last header 99afb9d refactor: init, simplify index shutdown code (furszy) 0faafb5 index: decrease ThreadSync cs_main contention (furszy) f1469eb index: cache last block filter header (furszy) a6756ec index: blockfilter, decouple header lookup into its own function (furszy) 331f044 index: blockfilter, decouple Write into its own function (furszy) bcbd7eb bench: basic block filter index initial sync (furszy) Pull request description: Work decoupled from bitcoin#26966 per request. The aim is to remove an unnecessary disk read operation that currently takes place with every new arriving block (or scanned block during background sync). Instead of reading the last filter header from disk merely to access its hash for constructing the next filter, this work caches it, occupying just 32 more bytes in memory. Also, reduces `cs_main` lock contention during the index initial sync process. And, simplifies the indexes initialization and shutdown procedure. Testing Note: To compare the changes, added a pretty basic benchmark in the second commit. Alternatively, could also test the changes by timing the block filter sync from scratch on any network; start the node with `-blockfilterindex` and monitor the logs until the syncing process finish. Local Benchmark Results: *Master (c252a0f): | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 132,042,516.60 | 7.57 | 0.3% | 7.79 | `BlockFilterIndexSync` *PR (43a212c): | ns/op | op/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 126,915,841.60 | 7.88 | 0.6% | 7.51 | `BlockFilterIndexSync` ACKs for top commit: Sjors: re-ACK 99afb9d achow101: ACK 99afb9d TheCharlatan: Re-ACK 99afb9d andrewtoth: ACK 99afb9d Tree-SHA512: 927daadd68f4ee1ca781a89519539b895f5185a76ebaf525fbc246ea8dcf40d44a82def00ac34b188640802844b312270067f1b33e65a2479e06be9169c616de
f6a52c6
to
49efddf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I was not able to later reproduce the crash it seems to be happen intermittently
Thanks for the report! Check it now if you can. There was a very subtle bug on which the worker threads might have accessed an index Sync()
local variable post-destruction.
And add option to customize thread pool workers count
It also adds coverage for initial sync from a particular block. Mimicking a node restart.
49efddf
to
69792b6
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
The current procedure for building the block filter index involves processing filters one at a time;
Reading blocks, undo data, and previous headers from disk sequentially.
This PR introduces a new mechanism to perform the work concurrently. Dividing the filters
generation workload among a pool of workers that can be configured by the user,
significantly increasing the speed of the index construction process.
The same concurrent processing model has been applied to the transactions index as well.
The newly introduced init flag
-indexworkers=<n>
enables the concurrent syncbehavior.
Where "n" is the number of worker threads that will be spawned at startup to create ranges
of block filters during the initial sync process. Destroying the workers pool once the
initial sync completes.
Note: by default, the parallelized sync process is not enabled.
Now the juicy part:
In my computer, with the node in debug mode and on IBD, with
-indexworkers=4
, theblock filter index generation took less than an hour. While, in master, the sync took more than 7 hours.
Important Note:
As the access to the block data on disk is protected by
cs_main
, this new feature runs substantiallyfaster when the node is not in IBD.
Testing Notes:
Sync your node without any index.
Restart the node with one of the indexes (
-blockfilterindex
or-txindex
) and-connect=0
(to sync only the index, without running the net/validation threads. Since threads won't be competing forcs_main
, this will give you a more accurate result).You’ll see a "[index name] is enabled at height [height]" log entry once it finishes. Then it’s just a matter of subtracting the index startup time from the "index synced" log time.
Keep in mind that threads are shared among all indexes you start. So if you run both indexes at the same time, your benchmark results cannot be compared against single-index runs.