validation: don't clear cache on periodic flush: >2x block connection speed #28233

andrewtoth · 2023-08-07T14:44:40Z

Since #17487 we no longer need to clear the coins cache when syncing to disk. A warm coins cache significantly speeds up block connection, and only needs to be fully flushed when nearing the dbcache limit.

Periodic flushes occur every 24 hours, which empties the cache and causes block connection to slow down. By keeping the cache through periodic flushes a node can run for several days with an increasingly hotter cache and connect blocks much more quickly. Now not only can setting a higher dbcache value be beneficial for IBD, it can also be beneficial for connecting blocks faster.

To benchmark in real world usage, I spun up 6 identical t2.small AWS EC2 instances, all running in the same region in the same VPC. I configured 2 instances to run master, 2 instances to run the change in this PR, and 2 instances to run the change in this PR but with dbcache=1000. All instances had prune=5000 and a 20 GB gp2 EBS volume. A 7th EC2 instance in the same VPC ran master and connected only to some trusted nodes in the outside network. Each of the 6 nodes under test only connected directly to this 7th instance. I manually pruned as much as possible and uploaded the same blocks, chainstate and mempool.dat to all instances. I started all 6 peers simultaneously at block height 835245 and ran them for over a week until block 836534.

The results were much faster block connection times for this branch compared to master, and much faster for this branch with dbcache=1000 compared to default dbcache.

branch	speed
master 1	1995.49ms/blk
master 2	2129.78ms/blk
branch default dbcache 1	1189.65ms/blk
branch default dbcache 2	1037.74ms/blk
branch dbcache=1000 1	393.69ms/blk
branch dbcache=1000 2	427.77ms/blk

The log files of all 6 instances are here.
There is a lot of noise with the exact times of blocks being connected, so I plotted the rolling 20 block connect time averages. The large dots are the times where the cache is emptied. For the red master nodes, this happens every 24 hours. The blue branch nodes with default dbcache only filled up and emptied the caches once, which is seen in the middle. The green branch nodes with 1000 dbcache never emptied the cache. It is very clear from the chart that whenever the cache is emptied, connect block speed degrades significantly.

Also note that this still clears the cache for pruning flushes. Having frequent pruning flushes with a large cache that doesn't clear is less performant than the status quo #15265 (comment). See #28280.

DrahtBot · 2023-08-07T14:44:43Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	sipa, brunoerg, achow101
Concept ACK	hernanmarino

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#29700 (kernel, refactor: return error status on all fatal errors by ryanofsky)
#28280 (Don't empty dbcache on prune flushes: >30% faster IBD by andrewtoth)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot · 2024-03-03T00:28:28Z

🤔 There hasn't been much activity lately and the CI seems to be failing.

If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn't been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

hernanmarino

Concept ACK. I also agree with the criteria in the code for setting the empty_cache variable to true

sipa · 2024-04-13T12:13:51Z

utACK 4a6d1d1

brunoerg

crACK 4a6d1d1

achow101 · 2024-05-13T20:04:59Z

ACK 4a6d1d1

Github-Pull: bitcoin#28233 Rebased-From: 4a6d1d1

e976bd3 validation: add randomness to periodic write interval (Andrew Toth) 2e2f410 refactor: replace m_last_write with m_next_write (Andrew Toth) b557fa7 refactor: rename fDoFullFlush to should_write (Andrew Toth) d73bd9f validation: write chainstate to disk every hour (Andrew Toth) 0ad7d7a test: chainstate write test for periodic chainstate flush (Andrew Toth) Pull request description: Since #28233, periodically writing the chainstate to disk every 24 hours does not clear the dbcache. Since #28280, periodically writing the chainstate to disk is proportional only to the amount of dirty entries in the cache. Due to these changes, it is no longer beneficial to only write the chainstate to disk every 24 hours. The periodic flush interval was necessary because every write of the chainstate would clear the dbcache. Now, we can get rid of the periodic flush interval and simply write the chainstate along with blocks and block index at least every hour. Three benefits of doing this: 1. For IBD or reindex-chainstate with a combination of large dbcache setting, slow CPU, slow internet speed/unreliable peers, it could be up to 24 hours until the chainstate is persisted to disk. A power outage or crash could potentially lose up to 24 hours of progress. If there is a very large amount of dirty cache entries, writing to disk when a flush finally does occur will take a very long time. Crashing during this window of writing can cause #11600. By syncing every hour in unison with the block index we avoid this problem. Only a maximum of one hour of progress can be lost, and the window for crashing during writing is much smaller. For IBD with lower dbcache settings, faster CPU, or better internet speed/reliable peers, chainstate writes are already triggered more often than every hour so this change will have no effect on IBD. 2. Based on discussion in #28280, writing only once every 24 hours during long running operation of a node causes IO spikes. Writing smaller chainstate changes every hour like we do with blocks and block index will reduce IO spikes. 3. Faster shutdown speeds. All dirty chainstate entries must be persisted to disk on shutdown. If we have a lot of dirty entries, such as when close to 24 hours or if we sync with a large dbcache, it can take a long time to shutdown. By keeping the chainstate clean we avoid this problem. Inspired by [this comment](#28280 (comment)). Resolves #11600 ACKs for top commit: achow101: ACK e976bd3 davidgumberg: utACK e976bd3 sipa: utACK e976bd3 l0rinc: ACK e976bd3 Tree-SHA512: 5bccd8f1dea47f9820a3fd32fe3bb6841c0167b3d6870cc8f3f7e2368f124af1a914bca6acb06889cd7183638a8dbdbace54d3237c3683f2b567eb7355e015ee

DrahtBot added Validation CI failed labels Aug 7, 2023

DrahtBot mentioned this pull request Aug 17, 2023

Don't empty dbcache on prune flushes: >30% faster IBD #28280

Merged

andrewtoth force-pushed the sync-on-periodic branch from 84ffcdf to 94f5598 Compare August 17, 2023 20:17

DrahtBot removed the CI failed label Aug 18, 2023

DrahtBot added CI failed and removed CI failed labels Sep 4, 2023

DrahtBot added CI failed and removed CI failed labels Sep 15, 2023

DrahtBot added the CI failed label Oct 25, 2023

andrewtoth marked this pull request as draft December 4, 2023 19:57

This was referenced Mar 13, 2024

kernel: Handle fatal errors through return values #29642

Draft

validation: Make translations of fatal errors consistent #29672

Merged

DrahtBot mentioned this pull request Mar 22, 2024

kernel, refactor: return error status on all fatal errors #29700

Draft

DrahtBot added the Needs rebase label Mar 22, 2024

validation: don't clear cache on periodic flush

4a6d1d1

andrewtoth force-pushed the sync-on-periodic branch from 94f5598 to 4a6d1d1 Compare March 27, 2024 18:06

andrewtoth marked this pull request as ready for review March 27, 2024 18:33

andrewtoth changed the title ~~validation: don't clear cache on periodic flush~~ validation: don't clear cache on periodic flush - 2x block connection speed Mar 27, 2024

andrewtoth changed the title ~~validation: don't clear cache on periodic flush - 2x block connection speed~~ validation: don't clear cache on periodic flush - >2x block connection speed Mar 27, 2024

DrahtBot removed the Needs rebase label Mar 27, 2024

andrewtoth changed the title ~~validation: don't clear cache on periodic flush - >2x block connection speed~~ validation: don't clear cache on periodic flush: >2x block connection speed Mar 27, 2024

DrahtBot removed the CI failed label Mar 28, 2024

hernanmarino approved these changes Apr 12, 2024

View reviewed changes

DrahtBot requested a review from hernanmarino April 13, 2024 12:13

brunoerg approved these changes Apr 29, 2024

View reviewed changes

achow101 merged commit d6069cb into bitcoin:master May 13, 2024

andrewtoth deleted the sync-on-periodic branch May 14, 2024 01:24

andrewtoth mentioned this pull request May 14, 2024

Drop -dbcache limit #28358

Merged

luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this pull request Aug 1, 2024

validation: don't clear cache on periodic flush

2cafbe3

Github-Pull: bitcoin#28233 Rebased-From: 4a6d1d1

andrewtoth mentioned this pull request Aug 8, 2024

validation: write chainstate to disk every hour #30611

Merged

sipa mentioned this pull request Jan 21, 2025

Use number of dirty cache entries in flush warnings/logs #31703

Open

bitcoin locked and limited conversation to collaborators May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

validation: don't clear cache on periodic flush: >2x block connection speed #28233

validation: don't clear cache on periodic flush: >2x block connection speed #28233

Uh oh!

andrewtoth commented Aug 7, 2023 •

edited

Loading

Uh oh!

DrahtBot commented Aug 7, 2023 •

edited

Loading

Uh oh!

DrahtBot commented Mar 3, 2024

Uh oh!

hernanmarino left a comment

Uh oh!

sipa commented Apr 13, 2024

Uh oh!

brunoerg left a comment

Uh oh!

achow101 commented May 13, 2024

Uh oh!

Uh oh!

validation: don't clear cache on periodic flush: >2x block connection speed #28233

validation: don't clear cache on periodic flush: >2x block connection speed #28233

Uh oh!

Conversation

andrewtoth commented Aug 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Aug 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage

Reviews

Conflicts

Uh oh!

DrahtBot commented Mar 3, 2024

Uh oh!

hernanmarino left a comment

Choose a reason for hiding this comment

Uh oh!

sipa commented Apr 13, 2024

Uh oh!

brunoerg left a comment

Choose a reason for hiding this comment

Uh oh!

achow101 commented May 13, 2024

Uh oh!

Uh oh!

andrewtoth commented Aug 7, 2023 •

edited

Loading

DrahtBot commented Aug 7, 2023 •

edited

Loading