wallet: improve rescan performance 640% #26951

pstratem · 2023-01-23T12:38:35Z

The BIP158 compact block filters make a compromise between performance and security by re-keying the filter for each block. That means every block requires re-hashing and re-sorting for every script pub key before matching against the filter.

To improve on the performance we can simply generate filters using per index random keys (rather than per filter keys). We don't share these filters with any attacker so there is no security risk.

I don't use synthetic benchmarks for this as we have only a single data set we're ever processing.

On an arbitrary server:
No filters: 132 minutes
Block filters: 45 minutes
Wallet filters: 7 minutes

DrahtBot · 2023-01-23T12:38:37Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#27419 (move-only: Extract common/args from util/system by TheCharlatan)
#26728 (wallet: Have the wallet store the key for automatically generated descriptors by achow101)
#25907 (wallet: rpc to add automatically generated descriptors by achow101)
#24539 (Add a "tx output spender" index by sstone)
#22341 (rpc: add getxpub by Sjors)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

0xB10C · 2023-01-24T14:43:16Z

While a 640% performance improvement is nice, a PR of this size could probably include a bit more details explaining what you've changed and why you've changed it in the OP and the commit messages. This makes it easier for reviewers to approach and review your PR.

mzumsande

Could you elaborate about when it would be advisable use this new feature? As far as I understand, the cost for this optimization is that we'd need to build and store a new index for rescanning - which will also take time and consume permanent disk space (how much?). Am I right to think that activating this would mostly make sense in specific situations where you know that you will need to rescan frequently in the future?

aureleoules · 2023-01-24T16:35:05Z

which will also take time and consume permanent disk space (how much?).

On mainnet it takes 8.4G currently.

theStack

Agree with previous reviewers that a more detailled PR description would be very helpful. For providing more systematic benchmarking results, you could use the approach from #25957 (see table in the PR description and the linked functional test benchmark) as a starting point. IIRC you mentioned on IRC that the speedup only show on a quite high element set size, i.e. a large number of scriptPubKeys in the wallet's keypool -- would be very interesting to also benchmark this to get a feeling what type of wallet users (w.r.t. to tx frequency) would benefit the most from this optimization.

Am I right to think that activating this would mostly make sense in specific situations where you know that you will need to rescan frequently in the future?

Probably another potential (non-wallet) use-case for this index would be to speed up the scanblocks RPC.

pstratem · 2023-01-26T09:43:36Z

@0xB10C I've reorganized the commits to be easier to understand and written a small explanation.
@mzumsande Building the wallet filter index is optional, user with many keys and/or many wallets would greatly benefit from the performance improvement.
@theStack I did a benchmark against no filters and bip158 filters on the mainnet chain specifically because it's really the only benchmark that matters.

pstratem · 2023-01-26T10:29:19Z

@theStack a more useful performance benchmark is that im looking into improving the filter decoder since that's about 50% of the cpu time

theStack · 2023-01-26T11:30:28Z

@pstratem: The main idea of the benchmark would be to determine the performance dependent of the wallet's keypool size. It's rather unlikely that a freshly created wallet (creating 8000 keys with default settings) shows the same speed-up as one that is in heavily in use daily and has hundreds of thousands keys in the pool, e.g. an exchange. At the very least, it would be interesting to know the keypool size of the wallet you used for the results shown in the PR description.

pstratem · 2023-01-26T11:42:14Z

@theStack the wallet I used had 10k keys, with more keys it will be slower since it does till have to do some processing for each key for each block.

edit: with 0 keys it's 5.4 minutes

brunoerg · 2023-01-27T09:52:06Z

Is it possible to see this performance improvement in any functional test?

Sjors · 2023-01-27T14:36:43Z

I agree with others above that this pull requests needs more explanation. The name "wallet" is also a bit confusing, since the basic block filter also used for wallets.

Can this be implemented with a new type of block filter, e.g. call it FAST, so that adding the index is a matter of -blockfilterindex=fast? That would probably make this PR smaller.

pstratem · 2023-01-28T15:26:13Z

@Sjors no it cannot

Can this be implemented with a new type of block filter, e.g. call it FAST, so that adding the index is a matter of -blockfilterindex=fast? That would probably make this PR smaller.

edit: to be clear that was the way i tried to do this initially a long time ago, it was a huge headache changing the disk format and storing the siphash keys

pstratem · 2023-01-28T16:10:02Z

@Sjors I'm not sure what else needs explaining, but I'm happy to explain things if i know what's confusing.

pstratem · 2023-01-29T07:41:30Z

Let me clarify some, basically I don't think the BlockFilter/BlockFilterIndex interface can be reasonably changed to be fast. There's a few issues (at least one of which is my fault).

The BlockFilter and BlockFilterIndex interfaces assume that the GCSFilter::Params can be generated just from the filter type and the block_hash. I previously tried changing these interfaces so that they didn't make this assumption, but I found that to involve a lot of changes that were frankly just confusing.

The BlockFilter serialization matches the BIP157/158 serialization, but isn't the serialization that I want for the wallet rescan index. Specifically the index ends up writing the block_hash twice and the filter_type to the flat file, both of which are implied when looking up the filter in the first place. (They waste disk space and decode time).

Lastly the BlockFilterIndex appends a sha256 hash to the end to check for disk corruption, this would be a significant cost (indeed the siphash that I'm using now is 10% of the total runtime, I'm going to replace it with a crc32c).

achow101

getindexinfo needs to be updated to include the wallet filter index.

I'm not entirely convinced that the extra effort to implement and maintain an index of filters for wallet use only is actually worth the benefit. While this does significantly improve the rescan time, it's not clear how often people rescan to make this worthwhile. Rescanning shouldn't be done all that frequently, so this might not end up being all that useful? It'd certainly be less of an issue if the index could be deduplicated with the block filter index.

achow101 · 2023-01-30T19:23:59Z

src/blockfilter.cpp

+    for (const uint64_t& hashed_element : m_hashed_elements) {
+        ranged_elements.push_back(RangeHashedElement(hashed_element, F));
+    }
+    return ranged_elements;


In 9035eb9 "Introduce GCSFilter::QuerySet a cached hash query set for GCSFilter::MatchAny"

Should m_hashed_elements be sorted first, or ranged_elements be sorted afterwards? m_hashed_elements may not be sorted depending on which variation of insert was called earlier.

m_hashed_elements needs to be sorted before it's used in the comparison, whether it's before or after being ranged doesn't change the result, but does change the runtime (the range function doesn't change the sort order of the list).

by doing the sort before ranging we only have to do it once per unique set of elements, which is a performance win.

it looks like i did mess up the initial commit, it's fixed in the later commit that makes sorting explicit, I can squash most of that into the initial commit.

I've broken out this change into it's own commit with an explanation.

achow101 · 2023-01-30T19:46:26Z

src/index/walletfilterindex.cpp

+// value for M explained https://github.com/sipa/writeups/tree/main/minimizing-golomb-filters
+constexpr uint8_t WALLET_FILTER_P = 19;
+constexpr uint32_t WALLET_FILTER_M = 784931; // 1.497137 * pow(2,WALLET_FILTER_P);


In d7a50ae "Implement WalletFilterIndex."

These already exist in src/blockfilter.h, I don't think they need to be redefined.

It might make sense to change these values from the BASIC filter values, changing P =32 improves the false positive rate from 1 in 784,931 to 1 in 6,430,154,452 for a trade off of 68% larger filters.

It also might make sense to make it a configuration option when the filters are generated, the value could easily be stored in the index leveldb instead of being a constant.

achow101 · 2023-01-30T19:55:30Z

src/index/walletfilterindex.h

+
+static constexpr bool DEFAULT_WALLETFILTERINDEX{false};
+
+class WalletFilterIndex final : public BaseIndex


In d7a50ae "Implement WalletFilterIndex."

Several of these functions are identical between WalletFilterIndex and BlockFilterIndex. Could these be refactored and deduplicated?

WalletFilterIndex::CustomCommit, WalletFilterElements, and DBHashKey are all candidates.

WalletFilterElements and DBHashKey I can certainly refactor to consolidate the logic.

The CustomCommit method though seems like it would end up adding a bunch of complexity to reuse with very little benefit.

achow101 · 2023-01-30T19:57:17Z

test/functional/wallet_faster_rescan.py

+NUM_BLOCKS = 6       # number of blocks to mine
+
+
+class WalletFasterRescanTest(BitcoinTestFramework):


In 523b709 " test: add test for faster rescan using block filters (top-up detection)"

This test is a literally a copy of wallet_fast_rescan.py. I think it would be better to just extend wallet_fast_rescan.py to run its tests twice, once with -blockfilterindex and a second time with -walletfilterindex. If we want parallelism in the test runner, could also add a CLI option to the test to run it with either of those indexes.

I think I can do that, but I haven't written many of the functional tests so that might take some trial and error.

achow101 · 2023-01-30T20:19:36Z

src/blockfilter.cpp

-            } else if (element_hashes[hashes_index] == value) {
+            } else if (RangeHashedElement(element_hashes[hashes_index]) == value) {
                return true;
-            } else if (element_hashes[hashes_index] > value) {
+            } else if (RangeHashedElement(element_hashes[hashes_index]) > value) {


In ec165ea "Truncate the hashed elements into range inside MatchInternal"

Is it possible for the sort order after the truncation to be different from the order before?

No the truncation doesn't change the sort order, the fast range function is just (x*n) >> 64, the order of which is always going to be the same as the order of x.

Doing the Range inside the MatchInternal means we don't have to make a copy of the array.

achow101 · 2023-01-30T21:38:52Z

src/Makefile.am

@@ -463,6 +463,7 @@ endif
 libbitcoin_wallet_a_CPPFLAGS = $(AM_CPPFLAGS) $(BITCOIN_INCLUDES) $(BOOST_CPPFLAGS) $(BDB_CPPFLAGS) $(SQLITE_CFLAGS)
 libbitcoin_wallet_a_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS)
 libbitcoin_wallet_a_SOURCES = \
+  blockfilter.cpp \


In 050716c "Use wallet filter during rescan if available."

I feel like there's a better way to do this, but I'm not familiar enough with build systems to suggest one. Perhaps @fanquake might have an idea?

yes this felt kind of wrong but I didn't see another way to achieve the result. suggestions definitely wanted.

pstratem · 2023-02-01T07:06:10Z

@achow101 These filters can be used for everything we use the other block filters for, I've only implemented the wallet query because that's the biggest win I see, but there's no reason they couldn't be used everywhere else (where there might be similar speed ups available).

maybe I should change the name of the index, FixedBlockFilterIndex ?

…MatchAny Caching the hashed and sorted elements allows for a substantially faster GCSFilter::MatchAny. We will use this later for faster wallet rescans.

This index is very similar to the BIP158 compact block filter indexes with the key difference that the siphash key is random per node rather than per block. The static key makes it possible to build a pre-hashed, pre-sorted query set for very fast scanning of block filters.

…d wallet RPCs

Previously we hashed, ranged, and then sorted. Now we're hashing, sorting, and then ranging. This is ok because the order of the hashed elements will always match the order of the ranged hashed elements. This avoids a copy of the hashed elements being made.

DrahtBot · 2023-04-21T10:28:38Z

🐙 This pull request conflicts with the target branch and needs rebase.

josibake · 2023-05-01T12:37:30Z

Concept -0

While this may be very useful for certain wallet usage patterns, this doesn't seem generally useful for all wallets, which makes it hard to justify the review/complexity/maintenance burden.

@achow101 These filters can be used for everything we use the other block filters for, I've only implemented the wallet query because that's the biggest win I see, but there's no reason they couldn't be used everywhere else (where there might be similar speed ups available).

Wallet rescans and serving BIP158 filters are the only use cases for the block filter index that I'm aware of. If you have other examples, it would be great to enumerate them.

maybe I should change the name of the index, FixedBlockFilterIndex ?

Rather than make a new index, if there are ways to improve the existing BlockFilter index, I think that would be much better as we would be improving two existing use cases without needing to maintain a new index.

Overall, while there is clearly an improvement over using the existing BlockFilter index for wallet rescans, I think wallet rescans with the existing index are "good enough" and adding an extra index provides diminishing returns.

furszy · 2023-05-01T12:42:18Z

Rather than make a new index, if there are ways to improve the existing BlockFilter index, I think that would be much better as we would be improving two existing use cases without needing to maintain a new index.

@josibake, orthogonal topic but check #26966 and #27006 :).

DrahtBot · 2023-07-30T00:06:06Z

There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

achow101 · 2023-09-20T16:02:56Z

This PR does not seem to have conceptual support. Please leave a comment if you would like this to be reopened.

pstratem force-pushed the 2023-01-23-gcsfilter branch 2 times, most recently from 2e70ef6 to e543d03 Compare January 23, 2023 12:55

This was referenced Jan 23, 2023

wallet: Replace GetBalance() logic with AvailableCoins() #26756

Closed

wallet: Introduce AddressBookManager #25620

Closed

Add a "tx output spender" index #24539

Open

pstratem changed the title ~~Use per node keyed block filter to rescan wallet.~~ wallet: improve rescan performance 640% Jan 24, 2023

DrahtBot added the Wallet label Jan 24, 2023

mzumsande reviewed Jan 24, 2023

View reviewed changes

theStack reviewed Jan 24, 2023

View reviewed changes

Refactor GCSFilter::HashToRange into HashElement and RangeHashedElement.

9cfb3e0

pstratem force-pushed the 2023-01-23-gcsfilter branch from e543d03 to dd20df1 Compare January 26, 2023 09:39

pstratem force-pushed the 2023-01-23-gcsfilter branch from dd20df1 to 523b709 Compare January 26, 2023 10:19

DrahtBot mentioned this pull request Jan 26, 2023

[Draft / POC] Silent Payments #24897

Closed

pstratem force-pushed the 2023-01-23-gcsfilter branch from 3b34de0 to ec165ea Compare January 26, 2023 12:10

This was referenced Jan 26, 2023

wallet: Have the wallet store the key for automatically generated descriptors #26728

Closed

wallet: rpc to add automatically generated descriptors #25907

Closed

DrahtBot mentioned this pull request Jan 28, 2023

rpc: add path to gethdkey #22341

Closed

achow101 reviewed Jan 30, 2023

View reviewed changes

pstratem added 2 commits February 1, 2023 06:31

Introduce GCSFilter::QuerySet a cached hash query set for GCSFilter::…

1053881

…MatchAny Caching the hashed and sorted elements allows for a substantially faster GCSFilter::MatchAny. We will use this later for faster wallet rescans.

Introduce GCSFilter::MatchAny that supports QuerySet

ba2b802

pstratem force-pushed the 2023-01-23-gcsfilter branch 2 times, most recently from c3326ee to 275d883 Compare February 1, 2023 12:13

pstratem added 2 commits February 1, 2023 07:24

Add walletfilterindex chain interfaces.

ec25a40

pstratem force-pushed the 2023-01-23-gcsfilter branch 2 times, most recently from 397925a to 7503df2 Compare February 1, 2023 12:45

pstratem added 3 commits February 1, 2023 07:47

Simplify filter lookup interface to only require the block_hash.

1dd3d54

Use wallet filter during rescan if available.

2317572

rpc: doc: mention rescan speedup using walletfilterindex=1 in affecte…

d45788d

…d wallet RPCs

pstratem force-pushed the 2023-01-23-gcsfilter branch from 7503df2 to 47e1353 Compare February 1, 2023 12:47

pstratem added 2 commits February 1, 2023 09:12

Eliminate BuildHashedSet to make logic more flexible for future changes.

04ccb5e

pstratem force-pushed the 2023-01-23-gcsfilter branch from 47e1353 to 5db1244 Compare February 2, 2023 15:00

Sjors mentioned this pull request Feb 7, 2023

index: initial sync speedup, parallelize process #26966

Open

DrahtBot mentioned this pull request Apr 4, 2023

move-only: Extract common/args from util/system #27419

Merged

Sjors mentioned this pull request Apr 19, 2023

support BIP39 mnemonic in descriptors #19151

Open

DrahtBot added the Needs rebase label Apr 21, 2023

achow101 requested review from josibake and Sjors April 25, 2023 16:03

achow101 closed this Sep 20, 2023

bitcoin locked and limited conversation to collaborators Sep 19, 2024


		static constexpr bool DEFAULT_WALLETFILTERINDEX{false};

		class WalletFilterIndex final : public BaseIndex

		NUM_BLOCKS = 6 # number of blocks to mine


		class WalletFasterRescanTest(BitcoinTestFramework):

wallet: improve rescan performance 640% #26951

wallet: improve rescan performance 640% #26951

Uh oh!

Conversation

pstratem commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews

Conflicts

Uh oh!

0xB10C commented Jan 24, 2023

Uh oh!

mzumsande left a comment

Choose a reason for hiding this comment

Uh oh!

aureleoules commented Jan 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

theStack left a comment

Choose a reason for hiding this comment

Uh oh!

pstratem commented Jan 26, 2023

Uh oh!

pstratem commented Jan 26, 2023

Uh oh!

theStack commented Jan 26, 2023

Uh oh!

pstratem commented Jan 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brunoerg commented Jan 27, 2023

Uh oh!

Sjors commented Jan 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pstratem commented Jan 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pstratem commented Jan 28, 2023

Uh oh!

pstratem commented Jan 29, 2023

Uh oh!

achow101 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pstratem Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pstratem commented Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Apr 21, 2023

Uh oh!

pstratem commented Jan 23, 2023 •

edited

Loading

DrahtBot commented Jan 23, 2023 •

edited

Loading

aureleoules commented Jan 24, 2023 •

edited

Loading

pstratem commented Jan 26, 2023 •

edited

Loading

Sjors commented Jan 27, 2023 •

edited

Loading

pstratem commented Jan 28, 2023 •

edited

Loading

pstratem Feb 1, 2023 •

edited

Loading

pstratem commented Feb 1, 2023 •

edited

Loading