Skip to content

Conversation

achow101
Copy link
Member

@achow101 achow101 commented Sep 18, 2019

For large wallets with many transactions and keys, inserting and fetching from a std::map can have a performance impact. Since we largely do not rely on the sorting properties of std::map, we can use std::unordered_map instead which is a hash table and insertion and retrieval are constant time. We also use std::unordered_multimap for some things that were std::multimap.

The changed maps are:

  • mapWallet: Map of all transactions
  • mapTxSpends: Map of outpoints to the txids of the txs that spent them
    *mapKeyMetadata: Map of key metadata
    *mapScriptMetadata: Map of script metadata
    *mapKeys: Map of private keys
    *mapCryptedKeys: Map of encrypted private keys
    *mapWatchKeys: Set of watch only keys.
    *setWatchOnly: Set of watch only scripts (std::unordered_set, not a map)

Change mapWallet and mapTxSpends did require a few other changes and thus they are in their own commits. Additionally, the GUI relied on getting transactions from the wallet in a sorted order which unordered_map does not provide, so the commit "Change getWalletTxs to return a set instead of a vector" is needed in order to put all of the txs into a std::set (which is ordered) instead of a vector in order to retain the same behavior.

mapTxSpends also relied on the sorted order to have some quick lookups, but these were changed to just do normal lookups over the same thing. It should be just as fast, or even faster, since std::unordered_map is a hash table.

The hash function used for these unordered maps and sets is SipHash, using the SipHash module that we already have. SaltedTxidHasher and SaltedOutPointHasher were moved from their original locations in utxo set and validation code in order to also be used from the wallet. Additionally SaltedIDHasher was added to hash uint160s (i.e. CKeyID and CScriptID) and SaltedScriptHasher was added to hash CScripts.

I did some becnhmarks with a large wallet I created on regtest using this script. This wallet is 169 MB in size with 117035 transactions and 103264 keys. It took ~20 secs to load on master, and ~18 secs with this change. So this change reduces wallet loading time by ~10%.

@achow101
Copy link
Member Author

Shuffled some things around to get rid of the circular dependency. Also fixed a typo

@promag
Copy link
Contributor

promag commented Sep 18, 2019

Concept ACK, especially improvements targeting big wallets (either lots of keys or lots of transactions).

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, will review and run valgrind+massif visualizer and flamegraphs to compare. Anywhere we can get a big wallet file? EDIT: Saw the script you posted in the PR body above... running.

@achow101
Copy link
Member Author

To benchmark this, I applied this commit: achow101@16ebe13. It adds logging of the load time and also stops bitcoind after the wallet has been loaded in order to get more consistent results.

Here are a couple graphs that may be interesting to people.

Flame graph (svg hosted on my website, couldn't inline in this comment) of master: https://achow101.com/bitcoin-master-flame.svg

Flame graph of this PR: https://achow101.com/bitcoin-reduce-wallet-load-flame.svg

Massif (heap profiler) memory usage over time on master:
image

Massif memory usage over time on this PR:
image

@@ -749,7 +749,7 @@ std::set<uint256> CWallet::GetConflicts(const uint256& txid) const
bool CWallet::HasWalletSpend(const uint256& txid) const
{
AssertLockHeld(cs_wallet);
auto iter = mapTxSpends.lower_bound(COutPoint(txid, 0));
auto iter = mapTxSpends.find(COutPoint(txid, 0));
Copy link
Contributor

@ryanofsky ryanofsky Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a bug here. Previously it would return true if any output was spent, now it will only return true if the first output was spent. Could fix pretty easily by passing const CTransaction& instead of txid, and looping over the outputs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a bug here. Previously it would return true if any output was spent, now it will only return true if the first output was spent. Could fix pretty easily by passing const CTransaction& instead of txid, and looping over the outputs.

A suggestion for a followup: it could be useful to add a test for it to prevent regressions in the future.

@achow101 achow101 force-pushed the reduce-wallet-load branch 4 times, most recently from f97e249 to 45a3ec5 Compare September 19, 2019 00:38
@fanquake fanquake removed the Tests label Sep 19, 2019
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 19, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@achow101
Copy link
Member Author

I did 1000 runs of my wallet loading benchmark. The mean loading time on master was 18408.14423 ms and the mean loading time on with this PR was 17498.71984. This is ~1 second faster, which is a ~5% performance increase with this PR. I also did a 2 sample t-test to check that these results are statistically significant, and it appears that they are.

@hebasto
Copy link
Member

hebasto commented Jun 27, 2020

Concept ACK on using hashed containers.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using auto for iterators type? It could improve readability a bit :)

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

05afffb changes look correct.

In commit 05afffb message s/mapScriptMetadata/m_script_metadata/

Comment on lines +36 to +40
* This *must* return size_t. With Boost 1.46 on 32-bit systems the
* unordered_map will behave unpredictably if the custom hasher returns a
* uint64_t, resulting in failures when syncing the chain (#4634).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

42bc524

This comment could be dropped as all available reference docs refer to the size_t return type for a hasher operator().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is just moved code, I'm going to leave as is.

@achow101 achow101 force-pushed the reduce-wallet-load branch from 05afffb to f9304b6 Compare June 29, 2020 18:14
@achow101
Copy link
Member Author

Why not using auto for iterators type? It could improve readability a bit :)

Done

@achow101 achow101 force-pushed the reduce-wallet-load branch from f9304b6 to 4b3439f Compare June 29, 2020 19:03
@achow101
Copy link
Member Author

Had to rebase onto master to pick up the uint160 changes to CKeyID and CScriptID

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 4b3439f, tested on Linux Mint 20 (x86_64).

Did not make benchmarks, though.


e89c19a
Why are operator() definitions not placed in the header, that could imply inlining?

@achow101
Copy link
Member Author

achow101 commented Jul 2, 2020

Why are operator() definitions not placed in the header, that could imply inlining?

I'm not sure that it matters since we never call operator() directly ourselves.

achow101 added 6 commits July 2, 2020 10:58
SaltedKeyIDHasher uses CSipHasher for hashing CKeyIDs
SaltedScriptIDHasher uses CSipHasher for hashing CScriptIDs
SaltedPubkeyasher uses CSipHasher for hashing CPubKeys
SaltedScriptHasher uses CSipHasher for hashing CScripts
…e wallet

Change mapKeys, mapScripts, mapCryptedKeys, mapKeyMetadata, m_script_metadata, mapWatchKeys
to use std::unordered_map.

Change setWatchOnly to use std::unordered_set
@achow101 achow101 force-pushed the reduce-wallet-load branch from 4b3439f to 8e3d1e7 Compare July 2, 2020 15:10
Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 8e3d1e7, only suggested changes since the previous review.

@achow101
Copy link
Member Author

This has been open for a while without much review. Closing for now, I'll try to integrate some of these changes the next time I touch the relevant places (e.g. mapWallet when I get around to changing how transactions are stored).

@achow101 achow101 closed this Aug 10, 2020
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants