Skip to content

Conversation

andrewtoth
Copy link
Contributor

@andrewtoth andrewtoth commented Aug 16, 2023

Since #17487 we no longer need to clear the coins cache when syncing to disk. A warm coins cache significantly speeds up block connection, and only needs to be fully flushed when nearing the dbcache limit.

For frequent pruning flushes there's no need to empty the cache and kill connect block speed. However, simply using Sync in place of Flush actually slows down a pruned full IBD with a high dbcache value. This is because as the cache grows, sync takes longer since every coin in the cache is scanned to check if it's dirty. For frequent prune flushes and a large cache this constant scanning starts to really slow IBD down, and just emptying the cache on every prune becomes faster.

To fix this, we can add two pointers to each cache entry and construct a doubly linked list of dirty entries. We can then only iterate through all dirty entries on each Sync, and simply clear the pointers after.

With this approach a full IBD with dbcache=16384 and prune=550 was 32% faster than master. For default dbcache=450 speedup was ~9%. All benchmarks were run with stopatheight=800000.

prune dbcache time max RSS speedup
master 550 16384 8:52:57 2,417,464k -
branch 550 16384 6:01:00 16,216,736k 32%
branch 550 450 8:05:08 2,818,072k 8.8%
master 10000 5000 8:19:59 2,962,752k -
branch 10000 5000 5:56:39 6,179,764k 28.8%
master 0 16384 4:51:53 14,726,408k -
branch 0 16384 4:43:11 16,526,348k 2.7%
master 0 450 7:08:07 3,005,892k -
branch 0 450 6:57:24 3,013,556k 2.6%

While the 2 pointers add memory to each cache entry, it did not slow down IBD. For non-pruned IBD results were similar for this branch and master. When I performed the initial IBD, the full UTXO set could be held in memory when using the max dbcache value. For non-pruned IBD with max dbcache to tip ended up using 12% more memory, but it was also 2.7% faster somehow. For smaller dbcache values the dbcache limit is respected so does not consume more memory, and the potentially more frequent flushes were not significant enough to cause any slowdown.

For reviewers, the commits in order do the following:
First 4 commits encapsulate all accesses to flags on cache entries, and then the 5th makes flags private.
Commits refactor: add CoinsCachePair alias to coins: call ClearFlags in CCoinsCacheEntry destructor create the linked list head nodes and cache entry self references and pass them into AddFlags.
Commit coins: track flagged cache entries in linked list actually adds the entries into a linked list when they are flagged DIRTY or FRESH and removes them from the linked list when they are destroyed or the flags are cleared manually. However, the linked list is not yet used anywhere.
Commit test: add cache entry linked list tests adds unit tests for the linked list.
Commit coins: pass linked list of flagged entries to BatchWrite uses the linked list to iterate through DIRTY entries instead of using the entire coins cache.
Commit validation: don't erase coins cache on prune flushes uses Sync instead of Flush for pruning flushes, so the cache is no longer cleared.

Inspired by this comment.

Fixes #11315.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 16, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK paplorinc, sipa, achow101, mzumsande
Concept ACK jamesob, hernanmarino, vostrnad

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30326 (optimization: Reduce cache lookups in CCoinsViewCache::FetchCoin by paplorinc)
  • #28216 (fuzz: a new target for the coins database by darosior)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jamesob
Copy link
Contributor

jamesob commented Aug 24, 2023

Concept ACK! I'll try to repro the bench results in the next week or so.

@0xB10C
Copy link
Contributor

0xB10C commented Aug 30, 2023

@andrewtoth Not sure if you have seen #20827 which is a different approach but has a similar effect.

@andrewtoth
Copy link
Contributor Author

andrewtoth commented Aug 30, 2023

@0xB10C indeed I have seen that one. I believe they complement each other. #20827 reduces IBD time for larger prune values and this PR reduces it even more for smaller prune values. I should benchmark this on top of that one.
Also, I'm curious how you are performing your benchmarks and generating all those nice graphs. Do you mind sharing what tools you use?

@0xB10C
Copy link
Contributor

0xB10C commented Aug 31, 2023

Also, I'm curious how you are performing your benchmarks and generating all those nice graphs. Do you mind sharing what tools you use?

At the time I had a spare and otherwise idle machine set up I could use. I don't have the bash script at hand but IIRC I had a merge base (MB) and a pull-request (PR) binary I'd run with different prune and dbcache configurations. After each run, the script would rename and move the debug log (with extra -debug=coindb -debug=prune) to a separate folder. After all runs were done, I loaded the debug logs into the following Jupyter notebooks to parse the debug logs and to generate the graphs. For anyone revisiting this, these are the relevant comments #20827 (comment), #20827 (comment), and #20827 (comment) for my setup and results.

I've uploaded the jupyter nodebooks here https://gist.github.com/0xB10C/89c2903290cfb1840792d41dcd079646. The first was used for a partial IBD from 500k-600k and the second one with a full IBD. Both times syncing from a local node on the same host. The notebooks expect the debug log files with the name debug_{binary}_{cset}_{run}.log where {binary} is either MB or PR, {cset} is an integer specifying the configuration set (pruning and dbcache parameter), and {run} is an integer numbering the run of this binary-cset combination if I did multiple. Feel free to remix and share in any way you want. Keep in mind that parsing logs is quite fragile as log message format can change. Feel free to reach out if there are questions.

@jamesob
Copy link
Contributor

jamesob commented Sep 20, 2023

Bench running now, should have some results in the next day or so.

$ ( bitcoinperf --no-teardown bench-pr --num-blocks 30_000 --run-id no-flush-on-prune \
    --bitcoind-args='-dbcache=4000 -prune=550' --run-count 3 28280 && \
    pushover 'Bench for andrewtoth SUCCEEDED!' ) || \
  pushover 'Bench for andrewtoth failed'

@jamesob
Copy link
Contributor

jamesob commented Sep 27, 2023

Local IBD of 30_000 blocks with -prune=550 from a Taproot-enabled height didn't show any difference. I didn't have -log=prune enabled, so my bench debug.log was basically useless. Rerunning...

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 589db87

@achow101 achow101 merged commit 27a770b into bitcoin:master Aug 8, 2024
16 checks passed
@andrewtoth andrewtoth deleted the sync-dirty branch August 8, 2024 00:46
@andrewtoth
Copy link
Contributor Author

🎉 Thanks everyone!

@hebasto
Copy link
Member

hebasto commented Aug 10, 2024

Ported to the CMake-based build system in hebasto#318.

l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Aug 14, 2024
l0rinc added a commit to l0rinc/bitcoin that referenced this pull request Aug 14, 2024
achow101 added a commit that referenced this pull request May 1, 2025
e976bd3 validation: add randomness to periodic write interval (Andrew Toth)
2e2f410 refactor: replace m_last_write with m_next_write (Andrew Toth)
b557fa7 refactor: rename fDoFullFlush to should_write (Andrew Toth)
d73bd9f validation: write chainstate to disk every hour (Andrew Toth)
0ad7d7a test: chainstate write test for periodic chainstate flush (Andrew Toth)

Pull request description:

  Since #28233, periodically writing the chainstate to disk every 24 hours does not clear the dbcache. Since #28280, periodically writing the chainstate to disk is proportional only to the amount of dirty entries in the cache. Due to these changes, it is no longer beneficial to only write the chainstate to disk every 24 hours. The periodic flush interval was necessary because every write of the chainstate would clear the dbcache. Now, we can get rid of the periodic flush interval and simply write the chainstate along with blocks and block index at least every hour.

  Three benefits of doing this:
  1. For IBD or reindex-chainstate with a combination of large dbcache setting, slow CPU, slow internet speed/unreliable peers, it could be up to 24 hours until the chainstate is persisted to disk. A power outage or crash could potentially lose up to 24 hours of progress. If there is a very large amount of dirty cache entries, writing to disk when a flush finally does occur will take a very long time. Crashing during this window of writing can cause #11600. By syncing every hour in unison with the block index we avoid this problem. Only a maximum of one hour of progress can be lost, and the window for crashing during writing is much smaller. For IBD with lower dbcache settings, faster CPU, or better internet speed/reliable peers, chainstate writes are already triggered more often than every hour so this change will have no effect on IBD.
  2. Based on discussion in #28280, writing only once every 24 hours during long running operation of a node causes IO spikes. Writing smaller chainstate changes every hour like we do with blocks and block index will reduce IO spikes.
  3. Faster shutdown speeds. All dirty chainstate entries must be persisted to disk on shutdown. If we have a lot of dirty entries, such as when close to 24 hours or if we sync with a large dbcache, it can take a long time to shutdown. By keeping the chainstate clean we avoid this problem.

  Inspired by [this comment](#28280 (comment)).

  Resolves #11600

ACKs for top commit:
  achow101:
    ACK e976bd3
  davidgumberg:
    utACK e976bd3
  sipa:
    utACK e976bd3
  l0rinc:
    ACK  e976bd3

Tree-SHA512: 5bccd8f1dea47f9820a3fd32fe3bb6841c0167b3d6870cc8f3f7e2368f124af1a914bca6acb06889cd7183638a8dbdbace54d3237c3683f2b567eb7355e015ee
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prune undermines the dbcache.