Skip to content

Conversation

furszy
Copy link
Member

@furszy furszy commented Jan 25, 2023

The current procedure for building the block filter index involves processing filters one at a time;
Reading blocks, undo data, and previous headers from disk sequentially.

This PR introduces a new mechanism to perform the work concurrently. Dividing the filters
generation workload among a pool of workers that can be configured by the user,
significantly increasing the speed of the index construction process.

The same concurrent processing model has been applied to the transactions index as well.

The newly introduced init flag -indexworkers=<n> enables the concurrent sync
behavior.
Where "n" is the number of worker threads that will be spawned at startup to create ranges
of block filters during the initial sync process. Destroying the workers pool once the
initial sync completes.
Note: by default, the parallelized sync process is not enabled.

Now the juicy part:
In my computer, with the node in debug mode and on IBD, with -indexworkers=4, the
block filter index generation took less than an hour. While, in master, the sync took more than 7 hours.

Important Note:
As the access to the block data on disk is protected by cs_main, this new feature runs substantially
faster when the node is not in IBD.

Testing Notes:

  1. Sync your node without any index.

  2. Restart the node with one of the indexes (-blockfilterindex or -txindex) and -connect=0 (to sync only the index, without running the net/validation threads. Since threads won't be competing for cs_main, this will give you a more accurate result).

    You’ll see a "[index name] is enabled at height [height]" log entry once it finishes. Then it’s just a matter of subtracting the index startup time from the "index synced" log time.

Keep in mind that threads are shared among all indexes you start. So if you run both indexes at the same time, your benchmark results cannot be compared against single-index runs.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 25, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/26966.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK w0xlt, TheCharlatan, ismaelsadeeq
Approach ACK ryanofsky
Stale ACK pinheadmz

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #32997 (index: Deduplicate HashKey / HeightKey handling by mzumsande)
  • #32541 (index: store per-block transaction locations for efficient lookups by romanz)
  • #31308 (ci, iwyu: Treat warnings as errors for specific directories by hebasto)
  • #29770 (index: Check all necessary block data is available before starting to sync by fjahr)
  • #17783 (common: Disallow calling IsArgSet() on ALLOW_LIST options by ryanofsky)
  • #17581 (refactor: Remove settings merge reverse precedence code by ryanofsky)
  • #17580 (refactor: Add ALLOW_LIST flags and enforce usage in CheckArgFlags by ryanofsky)
  • #17493 (util: Forbid ambiguous multiple assignments in config file by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

LLM Linter (✨ experimental)

Possible typos and grammar issues:

  • “the error will be already be” → “the error will already be” [duplicate “be” makes the sentence invalid]

drahtbot_id_4_m

@Sjors
Copy link
Member

Sjors commented Jan 26, 2023

Cool, will take it for a spin...

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from d8b0f5e to 65dc850 Compare January 28, 2023 15:04
@furszy
Copy link
Member Author

furszy commented Jan 28, 2023

Cool @Sjors, just pushed a small update. Found a little bug.

Going to add an important note to the PR description (because otherwise testing results will vary a lot):

As the access to the block data on disk is protected by cs_main, this new feature runs substantially
faster when the node is not in IBD.
(where "substantially" here means full index sync, with 5 workers, in less than 20 minutes in my computer).

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 65dc850 to 09cc56a Compare January 28, 2023 15:40
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably much easier suggested than done, but did you attempt to implement parallelization in a more general way so that other indices could benefit from it as well? On first glance, txindex, and the indices suggested in PRs (#24539, #26951) seem to be parallelizable as well (not coinstatsindex though).

@furszy
Copy link
Member Author

furszy commented Jan 30, 2023

It's probably much easier suggested than done, but did you attempt to implement parallelization in a more general way so that other indices could benefit from it as well?

Yeah, that is part of the plan. I started with the block filter index because it requires an special treatment that txindex parallelization does not require (block/undo data reading and block filters creation can be parallelized but writing must be done sequentially due the need to link filter headers to their predecessors to create the filters-chain on disk).

My idea was to start reviewing this one, so the process gets as clean as possible, and then move forward with the generalization step. It's usually more natural to abstract processes when the specific cases are well-defined.

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 09cc56a to 43e6237 Compare January 30, 2023 16:15
Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Perhaps it could have separate parallelization and index logic. So it could be reused in other indexes.

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch 2 times, most recently from b2b62f5 to 4e1fba1 Compare January 30, 2023 20:55
@Sjors
Copy link
Member

Sjors commented Feb 7, 2023

Building the index on (AMD Ryzen 7950X, blocks stored on SSD):

  • master @ fe86616: 35'15" (mostly 1 alternating CPU thread)
  • this PR (rebased)
    • n=8: 5'20" (uses about 8 of 32 CPU threads as expected)
    • n=32: 4'26" (pleasantly close to 100% CPU usage with a dip every 10 seconds, but it drops to only 1 CPU in the last minute or two)

I made sure to not load any wallets and disabled other indexes.

I didn't test if the index was correct.

I wonder if, for users without this index, it would be faster to generate the index, rescan the wallet and then delete it again. Combined with #26951 you would only have to generate filters up to the age of the wallet (IIUC, cc @pstratem).

Note to self for future benchmarks: -txindex takes 1:07'14", -coinstatsindex takes 3.5 hours.

@furszy
Copy link
Member Author

furszy commented Feb 7, 2023

Great results @Sjors!.

Could also give it a run rebased on top #27006.
On master, the index initial sync is slower when the node is in IBD because the index thread has to compete for access to block data on disk through cs_main acquisition.

I didn't test if the index was correct.

The PR contain a test verifying it.


Side note:
I'm working on generalizing the parallelization flow so other indexes, like the txindex and #26951 can make use of it too.

Copy link
Member

@pinheadmz pinheadmz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code review ACK 068537a

Built and tested on macos/arm64 as well as Debian/x86. Left several questions and suggestions. I really like ThreadPool as a util and look forward to using it for http workers as well, to share the code.

I'm currently rebuilding block filter and tx indexes with this branch to compare against master, which after 48 hours was only about 70% complete...

Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

ACK 068537ae61309a61e3bc350dc747d75cf2207517
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmiD4qAACgkQ5+KYS2KJ
yToG8g//d72j0Pq/TmDjOA0nbCrqrBIKLIvHiTD8DIBCvCb2Ul+NnZDfvFAbuR9s
nkArmxEUPRPBSkKBYMqJHUt+JF4zsLyttkryJW1vFCHwLi49an82B2MKSNNyOfs+
PUufS9ro5FNDNax66jVdjD1/CrNRYAt/AQ/K3FSo7FNG5dbpO2n09ZBXWAHqwZfU
3bf7p3Ug0JBlEe7/JMz1Wbu7wDV9E0lINarr/n5dnVQZTBLHaCvabSYtrEix/BRG
ku2MjexrbZdR5PY1xKQvJYkOkndZDkLQVJMC9BT9GeCYBdGRaGXadZQ3AwNlzobz
JVSMQO2Ngv/Ow8IQWrAs705Moqzu680PjdodxTSrj0QU/D4cNJAW390QzwT7evrw
pAVipwM4oomc4ZMfWX4pq8AwlZ+GywEIMa34UbO0pbpGnlIEUPRUBb7l7ODzcCcb
/cp9l9xpXwWH8+1yY9uRUnh9AHRlVsNdTJeCq3kTU8W1CYcmCOaImwjMJU3tM9Aj
vgFzVQam4KYJTD+WjhKkFAb7M0uTF6LqaM5ChfSaTR1zXuVQ7OX0sYM8R0B2QLJ8
uxNzNF/KnTJZYaOc0i/b4JHP3MZWgoO3GcT64woj1GqHGjLj2YOAKYSfpNZfWaL+
kdwRnzC4qgMKunoqtv4sX9Ht0Hk7qyDfPnEog/AT4wnSbiP/HZs=
=VG2A
-----END PGP SIGNATURE-----

pinheadmz's public key is on openpgp.org

counter.fetch_add(1);
});
}
std::this_thread::sleep_for(std::chrono::milliseconds{100});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d2138be

is this sleep to give tasks a chance to execute if the blocking breaks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d2138be

is this sleep to give tasks a chance to execute if the blocking breaks?

Good eye.
IIRC, I added it to wait until the workers actually get blocked. Otherwise, the thread pool queue size would be greater than the expected value (because it did not consumed the blocking tasks).
Still, we could remove this by adding some ready_promises, as did in test 2. A bit more code but less fragile.

/// async result batches in a synchronous fashion (if required).
[[nodiscard]] virtual std::any CustomProcessBlock(const interfaces::BlockInfo& block_info) {
// If parallel sync is enabled, the child class must implement this method.
if (AllowParallelSync()) return std::any();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

76aa1c2

I wonder if this condition should be a bit more attention-getting, since it should never execute right? Either log something or Assume() for the benefit of future index developers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

76aa1c2

I wonder if this condition should be a bit more attention-getting, since it should never execute right? Either log something or Assume() for the benefit of future index developers?

Hmm, or we could remove this line and call CustomAppend by default. Then it would be up to the child index to decide whether it needs sequential ordering during parallel sync or not. This way, the tx and BIP352 indexes would require fewer lines of code.

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 068537a to 9bba5ef Compare July 25, 2025 21:35
@pinheadmz
Copy link
Member

Did a rough benchmark test. Froze a full node at height 906551 by restarting with -noconnect and also -txindex -blockfilterindex. First test was from master, after 48 hours I aborted the process after reaching only:

$ bitcoin-cli getindexinfo
{
  "txindex": {
    "synced": false,
    "best_block_height": 770289
  },
  "basic block filter index": {
    "synced": true,
    "best_block_height": 906551
  }
}

Running this PR (restarting with empty indexes/), both indexing operations were complete after about 16 hours:

2025-07-25T18:12:46Z txindex thread start
2025-07-25T18:12:46Z basic block filter index thread start
...
2025-07-25T19:48:16Z basic block filter index thread exit
2025-07-26T10:14:21Z txindex thread exit

{
  "txindex": {
    "synced": true,
    "best_block_height": 906551
  },
  "basic block filter index": {
    "synced": true,
    "best_block_height": 906551
  }
}

Copy link
Member

@pinheadmz pinheadmz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 9bba5ef

Changes since last review are minimal responses to my own review suggestions. Built on macos/arm64 and debian/x86. Ran functional and unit tests, ran with -indexworkers=16 on mainnet fullnode with >900000 blocks

Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

ACK 9bba5ef0cc5b6890eba2be3a6ed429c4fea5a28f
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmiI394ACgkQ5+KYS2KJ
yTruCxAAwQ1Q4syFsBPG0fS1KaKSV7LKLl5oJNzr5RPP7w/NliMB3KDXI8lhQnhr
ej6NNs7ovm05O99QrjFomGyg5oYO+rcECPoXp5lrwsOQBWJk10ZNgDIKdh2ChWr0
sn2o8DCsGpHfkXZ9qg8ynStt5ScCv/1bopb2jvaFjWHdy/5LREJ62XuJZoah5O74
kSSWyj1Nxi9oVNKctXifFp0WwqOqofft2kgDWghRPw67SERuZGqcyzb89zKLPLxr
iyMIWd8LY36isVj7XZBNVz1+jQnr77ldR15Uar/CKlrEoNC/tKs/NBE0eXAHrKhs
4iuKdtX470yazbKGAm6Ei2wQbYa4w5319QDa6pVYlqo30ZMlW/AvVMC//Uc1/v3a
fCW2ootvbbzz0lBPXg5yIB8wvaCuDkYYjMGL6tnYg1T4ChlS7P2y7IbfWryAjOKt
UN1kJNkMoXzSrHu2nZaDmIv3hFpQBaDUH0BlT+unUg8hUioE1VqpIw/+rmcCj7eA
IhKJ0NnJZnh92y5Y+9b+lf+ix6TiF0ZqGT/h+5Yv7Vj8YVXP006HpKvpx2Y84ii+
5IZag6OviHoKt056Gn2iixQFSbw6hVMOw8JnF5FC3iZR28roEkJlY2WtOUF6h/Se
UJTC1VMQuxRm+AIYLwC8w1RCRMoZDRKlOKBjOY5beFCIOARef3w=
=Zxbl
-----END PGP SIGNATURE-----

pinheadmz's public key is on openpgp.org


// Create workers
for (int i = 0; i < num_workers; i++) {
m_workers.emplace_back(&util::TraceThread, m_name + "_threadpool_worker_" + util::ToString(i), [this] { WorkerThread(); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bike shedding now so ok to ignore -- but when we set thread names on Linux they are truncated to 15 characters. So I dunno I guess "thread" and "worker" are redundant in the name of a thread?

//! internal name.
static void SetThreadName(const char* name)
{
#if defined(PR_SET_NAME)
// Only the first 15 characters are used (16 - NUL terminator)
::prctl(PR_SET_NAME, name, 0, 0, 0);

image

DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Jul 31, 2025
…ations by caching last header

99afb9d refactor: init, simplify index shutdown code (furszy)
0faafb5 index: decrease ThreadSync cs_main contention (furszy)
f1469eb index: cache last block filter header (furszy)
a6756ec index: blockfilter, decouple header lookup into its own function (furszy)
331f044 index: blockfilter, decouple Write into its own function (furszy)
bcbd7eb bench: basic block filter index initial sync (furszy)

Pull request description:

  Work decoupled from bitcoin#26966 per request.

  The aim is to remove an unnecessary disk read operation that currently takes place with every new arriving block (or scanned block during background sync). Instead of reading the last filter header from disk merely to access its hash for constructing the next filter, this work caches it, occupying just 32 more bytes in memory.

  Also, reduces `cs_main` lock contention during the index initial sync process. And, simplifies the indexes initialization and shutdown procedure.

  Testing Note:
  To compare the changes, added a pretty basic benchmark in the second commit. Alternatively, could also test the changes by timing the block filter sync from scratch on any network; start the node with `-blockfilterindex` and monitor the logs until the syncing process finish.

  Local Benchmark Results:

  *Master (c252a0f):
  |               ns/op |                op/s |    err% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------:|:----------
  |      132,042,516.60 |                7.57 |    0.3% |      7.79 | `BlockFilterIndexSync`

  *PR (43a212c):
  |               ns/op |                op/s |    err% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------:|:----------
  |      126,915,841.60 |                7.88 |    0.6% |      7.51 | `BlockFilterIndexSync`

ACKs for top commit:
  Sjors:
    re-ACK 99afb9d
  achow101:
    ACK 99afb9d
  TheCharlatan:
    Re-ACK 99afb9d
  andrewtoth:
    ACK 99afb9d

Tree-SHA512: 927daadd68f4ee1ca781a89519539b895f5185a76ebaf525fbc246ea8dcf40d44a82def00ac34b188640802844b312270067f1b33e65a2479e06be9169c616de

# Conflicts:
#	src/index/base.cpp
#	src/index/base.h
#	src/index/blockfilterindex.cpp
@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 9bba5ef to b349fba Compare August 7, 2025 22:43
@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 8, 2025

🚧 At least one of the CI tasks failed.
Task TSan, depends, no gui: https://github.com/bitcoin/bitcoin/runs/47637510992
LLM reason (✨ experimental): The CI failure is due to the threadpool_tests timing out.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch 2 times, most recently from 647f6f0 to f6a52c6 Compare August 8, 2025 02:33
@DrahtBot DrahtBot removed the CI failed label Aug 8, 2025
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Aug 9, 2025
…ations by caching last header

99afb9d refactor: init, simplify index shutdown code (furszy)
0faafb5 index: decrease ThreadSync cs_main contention (furszy)
f1469eb index: cache last block filter header (furszy)
a6756ec index: blockfilter, decouple header lookup into its own function (furszy)
331f044 index: blockfilter, decouple Write into its own function (furszy)
bcbd7eb bench: basic block filter index initial sync (furszy)

Pull request description:

  Work decoupled from bitcoin#26966 per request.

  The aim is to remove an unnecessary disk read operation that currently takes place with every new arriving block (or scanned block during background sync). Instead of reading the last filter header from disk merely to access its hash for constructing the next filter, this work caches it, occupying just 32 more bytes in memory.

  Also, reduces `cs_main` lock contention during the index initial sync process. And, simplifies the indexes initialization and shutdown procedure.

  Testing Note:
  To compare the changes, added a pretty basic benchmark in the second commit. Alternatively, could also test the changes by timing the block filter sync from scratch on any network; start the node with `-blockfilterindex` and monitor the logs until the syncing process finish.

  Local Benchmark Results:

  *Master (c252a0f):
  |               ns/op |                op/s |    err% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------:|:----------
  |      132,042,516.60 |                7.57 |    0.3% |      7.79 | `BlockFilterIndexSync`

  *PR (43a212c):
  |               ns/op |                op/s |    err% |     total | benchmark
  |--------------------:|--------------------:|--------:|----------:|:----------
  |      126,915,841.60 |                7.88 |    0.6% |      7.51 | `BlockFilterIndexSync`

ACKs for top commit:
  Sjors:
    re-ACK 99afb9d
  achow101:
    ACK 99afb9d
  TheCharlatan:
    Re-ACK 99afb9d
  andrewtoth:
    ACK 99afb9d

Tree-SHA512: 927daadd68f4ee1ca781a89519539b895f5185a76ebaf525fbc246ea8dcf40d44a82def00ac34b188640802844b312270067f1b33e65a2479e06be9169c616de
@ismaelsadeeq
Copy link
Member

Fwiw I encountered a segfault previously while testing on master on previous PR HEAD 9bba5ef

I ran the node on mainnet with -noconnect -txindex -blockfilterindex -indexworkers=4
Screenshot 2025-08-05 at 09 42 43

Although I was not able to later reproduce the crash it seems to be happen intermittently cc @furszy

@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from f6a52c6 to 49efddf Compare August 12, 2025 20:25
Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I was not able to later reproduce the crash it seems to be happen intermittently

Thanks for the report! Check it now if you can. There was a very subtle bug on which the worker threads might have accessed an index Sync() local variable post-destruction.

furszy added 5 commits August 20, 2025 05:17
And add option to customize thread pool workers count
It also adds coverage for initial sync from a particular block.
Mimicking a node restart.
@furszy furszy force-pushed the 2022_parallelize_blockfilter_index_2 branch from 49efddf to 69792b6 Compare August 20, 2025 09:26
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Task MSan, depends: https://github.com/bitcoin/bitcoin/runs/48475050585
LLM reason (✨ experimental): The CI failure is caused by a test timeout due to the index not reaching the expected height within the allowed time.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.