-
Notifications
You must be signed in to change notification settings - Fork 37.7k
wallet: reduce loading time by using unordered maps #16910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e963dd5
to
4ed9dd6
Compare
Shuffled some things around to get rid of the circular dependency. Also fixed a typo |
Concept ACK, especially improvements targeting big wallets (either lots of keys or lots of transactions). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, will review and run valgrind+massif visualizer and flamegraphs to compare. Anywhere we can get a big wallet file? EDIT: Saw the script you posted in the PR body above... running.
To benchmark this, I applied this commit: achow101@16ebe13. It adds logging of the load time and also stops bitcoind after the wallet has been loaded in order to get more consistent results. Here are a couple graphs that may be interesting to people. Flame graph (svg hosted on my website, couldn't inline in this comment) of master: https://achow101.com/bitcoin-master-flame.svg Flame graph of this PR: https://achow101.com/bitcoin-reduce-wallet-load-flame.svg |
src/wallet/wallet.cpp
Outdated
@@ -749,7 +749,7 @@ std::set<uint256> CWallet::GetConflicts(const uint256& txid) const | |||
bool CWallet::HasWalletSpend(const uint256& txid) const | |||
{ | |||
AssertLockHeld(cs_wallet); | |||
auto iter = mapTxSpends.lower_bound(COutPoint(txid, 0)); | |||
auto iter = mapTxSpends.find(COutPoint(txid, 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a bug here. Previously it would return true if any output was spent, now it will only return true if the first output was spent. Could fix pretty easily by passing const CTransaction& instead of txid, and looping over the outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a bug here. Previously it would return true if any output was spent, now it will only return true if the first output was spent. Could fix pretty easily by passing const CTransaction& instead of txid, and looping over the outputs.
A suggestion for a followup: it could be useful to add a test for it to prevent regressions in the future.
f97e249
to
45a3ec5
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
I did 1000 runs of my wallet loading benchmark. The mean loading time on master was 18408.14423 ms and the mean loading time on with this PR was 17498.71984. This is ~1 second faster, which is a ~5% performance increase with this PR. I also did a 2 sample t-test to check that these results are statistically significant, and it appears that they are. |
45a3ec5
to
73bb122
Compare
73bb122
to
de0f388
Compare
3324a42
to
c26b49f
Compare
c26b49f
to
fe0b951
Compare
f5428d4
to
05afffb
Compare
Concept ACK on using hashed containers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using auto
for iterators type? It could improve readability a bit :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* This *must* return size_t. With Boost 1.46 on 32-bit systems the | ||
* unordered_map will behave unpredictably if the custom hasher returns a | ||
* uint64_t, resulting in failures when syncing the chain (#4634). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment could be dropped as all available reference docs refer to the size_t
return type for a hasher operator()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is just moved code, I'm going to leave as is.
05afffb
to
f9304b6
Compare
Done |
f9304b6
to
4b3439f
Compare
Had to rebase onto master to pick up the uint160 changes to CKeyID and CScriptID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that it matters since we never call |
SaltedKeyIDHasher uses CSipHasher for hashing CKeyIDs SaltedScriptIDHasher uses CSipHasher for hashing CScriptIDs SaltedPubkeyasher uses CSipHasher for hashing CPubKeys SaltedScriptHasher uses CSipHasher for hashing CScripts
…e wallet Change mapKeys, mapScripts, mapCryptedKeys, mapKeyMetadata, m_script_metadata, mapWatchKeys to use std::unordered_map. Change setWatchOnly to use std::unordered_set
4b3439f
to
8e3d1e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been open for a while without much review. Closing for now, I'll try to integrate some of these changes the next time I touch the relevant places (e.g. |
For large wallets with many transactions and keys, inserting and fetching from a
std::map
can have a performance impact. Since we largely do not rely on the sorting properties ofstd::map
, we can usestd::unordered_map
instead which is a hash table and insertion and retrieval are constant time. We also usestd::unordered_multimap
for some things that werestd::multimap
.The changed maps are:
mapWallet
: Map of all transactionsmapTxSpends
: Map of outpoints to the txids of the txs that spent them*
mapKeyMetadata
: Map of key metadata*
mapScriptMetadata
: Map of script metadata*
mapKeys
: Map of private keys*
mapCryptedKeys
: Map of encrypted private keys*
mapWatchKeys
: Set of watch only keys.*
setWatchOnly
: Set of watch only scripts (std::unordered_set, not a map)Change
mapWallet
andmapTxSpends
did require a few other changes and thus they are in their own commits. Additionally, the GUI relied on getting transactions from the wallet in a sorted order which unordered_map does not provide, so the commit "Change getWalletTxs to return a set instead of a vector" is needed in order to put all of the txs into a std::set (which is ordered) instead of a vector in order to retain the same behavior.mapTxSpends
also relied on the sorted order to have some quick lookups, but these were changed to just do normal lookups over the same thing. It should be just as fast, or even faster, since std::unordered_map is a hash table.The hash function used for these unordered maps and sets is SipHash, using the SipHash module that we already have.
SaltedTxidHasher
andSaltedOutPointHasher
were moved from their original locations in utxo set and validation code in order to also be used from the wallet. AdditionallySaltedIDHasher
was added to hash uint160s (i.e.CKeyID
andCScriptID
) andSaltedScriptHasher
was added to hashCScript
s.I did some becnhmarks with a large wallet I created on regtest using this script. This wallet is 169 MB in size with 117035 transactions and 103264 keys. It took ~20 secs to load on master, and ~18 secs with this change. So this change reduces wallet loading time by ~10%.