-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Don't empty dbcache on prune flushes: >30% faster IBD #28280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
f5b20ca
to
97b6bb5
Compare
2f78edb
to
7bdbf31
Compare
d35cea3
to
b8ccdac
Compare
Concept ACK! I'll try to repro the bench results in the next week or so. |
@andrewtoth Not sure if you have seen #20827 which is a different approach but has a similar effect. |
@0xB10C indeed I have seen that one. I believe they complement each other. #20827 reduces IBD time for larger |
At the time I had a spare and otherwise idle machine set up I could use. I don't have the bash script at hand but IIRC I had a merge base (MB) and a pull-request (PR) binary I'd run with different I've uploaded the jupyter nodebooks here https://gist.github.com/0xB10C/89c2903290cfb1840792d41dcd079646. The first was used for a partial IBD from 500k-600k and the second one with a full IBD. Both times syncing from a local node on the same host. The notebooks expect the debug log files with the name |
Bench running now, should have some results in the next day or so. $ ( bitcoinperf --no-teardown bench-pr --num-blocks 30_000 --run-id no-flush-on-prune \
--bitcoind-args='-dbcache=4000 -prune=550' --run-count 3 28280 && \
pushover 'Bench for andrewtoth SUCCEEDED!' ) || \
pushover 'Bench for andrewtoth failed' |
Local IBD of 30_000 blocks with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 589db87
🎉 Thanks everyone! |
Ported to the CMake-based build system in hebasto#318. |
See related discussions: * bitcoin#28280 (comment) * bitcoin#28280 (comment) * bitcoin#17487 (comment) And reproducer that demonstrates the behavior of all 3 cases: * https://gist.github.com/paplorinc/66f13938d867d82893d7ab7a2ebc5717
See related discussions: * bitcoin#28280 (comment) * bitcoin#28280 (comment) * bitcoin#17487 (comment) And reproducer that demonstrates the behavior of all 3 cases: * https://gist.github.com/paplorinc/66f13938d867d82893d7ab7a2ebc5717
e976bd3 validation: add randomness to periodic write interval (Andrew Toth) 2e2f410 refactor: replace m_last_write with m_next_write (Andrew Toth) b557fa7 refactor: rename fDoFullFlush to should_write (Andrew Toth) d73bd9f validation: write chainstate to disk every hour (Andrew Toth) 0ad7d7a test: chainstate write test for periodic chainstate flush (Andrew Toth) Pull request description: Since #28233, periodically writing the chainstate to disk every 24 hours does not clear the dbcache. Since #28280, periodically writing the chainstate to disk is proportional only to the amount of dirty entries in the cache. Due to these changes, it is no longer beneficial to only write the chainstate to disk every 24 hours. The periodic flush interval was necessary because every write of the chainstate would clear the dbcache. Now, we can get rid of the periodic flush interval and simply write the chainstate along with blocks and block index at least every hour. Three benefits of doing this: 1. For IBD or reindex-chainstate with a combination of large dbcache setting, slow CPU, slow internet speed/unreliable peers, it could be up to 24 hours until the chainstate is persisted to disk. A power outage or crash could potentially lose up to 24 hours of progress. If there is a very large amount of dirty cache entries, writing to disk when a flush finally does occur will take a very long time. Crashing during this window of writing can cause #11600. By syncing every hour in unison with the block index we avoid this problem. Only a maximum of one hour of progress can be lost, and the window for crashing during writing is much smaller. For IBD with lower dbcache settings, faster CPU, or better internet speed/reliable peers, chainstate writes are already triggered more often than every hour so this change will have no effect on IBD. 2. Based on discussion in #28280, writing only once every 24 hours during long running operation of a node causes IO spikes. Writing smaller chainstate changes every hour like we do with blocks and block index will reduce IO spikes. 3. Faster shutdown speeds. All dirty chainstate entries must be persisted to disk on shutdown. If we have a lot of dirty entries, such as when close to 24 hours or if we sync with a large dbcache, it can take a long time to shutdown. By keeping the chainstate clean we avoid this problem. Inspired by [this comment](#28280 (comment)). Resolves #11600 ACKs for top commit: achow101: ACK e976bd3 davidgumberg: utACK e976bd3 sipa: utACK e976bd3 l0rinc: ACK e976bd3 Tree-SHA512: 5bccd8f1dea47f9820a3fd32fe3bb6841c0167b3d6870cc8f3f7e2368f124af1a914bca6acb06889cd7183638a8dbdbace54d3237c3683f2b567eb7355e015ee
Since #17487 we no longer need to clear the coins cache when syncing to disk. A warm coins cache significantly speeds up block connection, and only needs to be fully flushed when nearing the
dbcache
limit.For frequent pruning flushes there's no need to empty the cache and kill connect block speed. However, simply using
Sync
in place ofFlush
actually slows down a pruned full IBD with a highdbcache
value. This is because as the cache grows, sync takes longer since every coin in the cache is scanned to check if it's dirty. For frequent prune flushes and a large cache this constant scanning starts to really slow IBD down, and just emptying the cache on every prune becomes faster.To fix this, we can add two pointers to each cache entry and construct a doubly linked list of dirty entries. We can then only iterate through all dirty entries on each
Sync
, and simply clear the pointers after.With this approach a full IBD with
dbcache=16384
andprune=550
was 32% faster than master. For defaultdbcache=450
speedup was ~9%. All benchmarks were run withstopatheight=800000
.While the 2 pointers add memory to each cache entry, it did not slow down IBD. For non-pruned IBD results were similar for this branch and master. When I performed the initial IBD, the full UTXO set could be held in memory when using the max
dbcache
value. For non-pruned IBD with maxdbcache
to tip ended up using 12% more memory, but it was also 2.7% faster somehow. For smallerdbcache
values thedbcache
limit is respected so does not consume more memory, and the potentially more frequent flushes were not significant enough to cause any slowdown.For reviewers, the commits in order do the following:
First 4 commits encapsulate all accesses to
flags
on cache entries, and then the 5th makesflags
private.Commits
refactor: add CoinsCachePair alias
tocoins: call ClearFlags in CCoinsCacheEntry destructor
create the linked list head nodes and cache entry self references and pass them intoAddFlags
.Commit
coins: track flagged cache entries in linked list
actually adds the entries into a linked list when they are flagged DIRTY or FRESH and removes them from the linked list when they are destroyed or the flags are cleared manually. However, the linked list is not yet used anywhere.Commit
test: add cache entry linked list tests
adds unit tests for the linked list.Commit
coins: pass linked list of flagged entries to BatchWrite
uses the linked list to iterate through DIRTY entries instead of using the entire coins cache.Commit
validation: don't erase coins cache on prune flushes
usesSync
instead ofFlush
for pruning flushes, so the cache is no longer cleared.Inspired by this comment.
Fixes #11315.