-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Add a pruning 'high water mark' to reduce the frequency of pruning events #11359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a pruning 'high water mark' to reduce the frequency of pruning events #11359
Conversation
Benchmarks, syncing against a localhost node. Sending node on HDD, syncing node on SSD. Clock starts at UpdateTip height=1. prune=550 dbcache=3000.
At height 350000, this PR results in a 529MiB dbcache vs. a 2646MiB dbcache unpruned. Final result is that the node can sync to height 350000 27% faster than without the PR by giving the prune target ~3GiB leeway. I didn't want to spend the time to reach the end but I suspect results would be similar or better. As in the above post, this is only a 'partial fix' because the dbcache is still limited empirically to far lower than the actual value. edit: Ah yes, space requirements. In this test the chainstate folder's final size is ~1GiB and the prune is allowed to overshoot by ~3GiB, so it raises the maximum disk space requirement by ~2GiB in this example. |
An additional performance gain could be gotten by tying this HWM to a percentage of the prune target. For example, with prune=100000 you could let the data get to 100G x 1.10 before pruning, or cap it at 100G and prune down to 100G x 0.90 (similar effect on dbcache in both cases). Looking at the documentation in -help:
so probably the 'remain below' option makes more sense, but that retains the far slower IBD mechanic at low prune levels |
No strong feelings from me, but when we worked on the pruning implementation our goal was to have the target be something that should be achievable. So if we were to decide that it's worth exceeding it intentionally (eg for performance reasons during ibd), we should remember that we need to clearly communicate that to users. But now that we in theory support non-atomic flushes, perhaps we can use that to flush less often during IBD even while we prune. |
Indeed, users expect that if they set |
See also #12404. |
…d pruning again soon after ac51a26 During IBD, when doing pruning, prune 10% extra to avoid pruning again soon after (Luke Dashjr) Pull request description: Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly. Alternative to #11359 Tree-SHA512: 631e4e8f94f5699e98a2eff07204aa2b3b2325b2d92e8236b8c8d6a6730737a346e0ad86024e705f5a665b25e873ab0970ce7396740328a437c060f99e9ba4d9
Needs rebase |
This is superseded by #11658 which was just merged. |
Closing for now as per @Sjors |
…to avoid pruning again soon after ac51a26 During IBD, when doing pruning, prune 10% extra to avoid pruning again soon after (Luke Dashjr) Pull request description: Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly. Alternative to bitcoin#11359 Tree-SHA512: 631e4e8f94f5699e98a2eff07204aa2b3b2325b2d92e8236b8c8d6a6730737a346e0ad86024e705f5a665b25e873ab0970ce7396740328a437c060f99e9ba4d9
…to avoid pruning again soon after ac51a26 During IBD, when doing pruning, prune 10% extra to avoid pruning again soon after (Luke Dashjr) Pull request description: Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly. Alternative to bitcoin#11359 Tree-SHA512: 631e4e8f94f5699e98a2eff07204aa2b3b2325b2d92e8236b8c8d6a6730737a346e0ad86024e705f5a665b25e873ab0970ce7396740328a437c060f99e9ba4d9
…to avoid pruning again soon after ac51a26 During IBD, when doing pruning, prune 10% extra to avoid pruning again soon after (Luke Dashjr) Pull request description: Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly. Alternative to bitcoin#11359 Tree-SHA512: 631e4e8f94f5699e98a2eff07204aa2b3b2325b2d92e8236b8c8d6a6730737a346e0ad86024e705f5a665b25e873ab0970ce7396740328a437c060f99e9ba4d9
…to avoid pruning again soon after ac51a26 During IBD, when doing pruning, prune 10% extra to avoid pruning again soon after (Luke Dashjr) Pull request description: Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly. Alternative to bitcoin#11359 Tree-SHA512: 631e4e8f94f5699e98a2eff07204aa2b3b2325b2d92e8236b8c8d6a6730737a346e0ad86024e705f5a665b25e873ab0970ce7396740328a437c060f99e9ba4d9
…to avoid pruning again soon after ac51a26 During IBD, when doing pruning, prune 10% extra to avoid pruning again soon after (Luke Dashjr) Pull request description: Pruning forces a chainstate flush, which can defeat the dbcache and harm performance significantly. Alternative to bitcoin#11359 Tree-SHA512: 631e4e8f94f5699e98a2eff07204aa2b3b2325b2d92e8236b8c8d6a6730737a346e0ad86024e705f5a665b25e873ab0970ce7396740328a437c060f99e9ba4d9
Partial fix for issue #11315.
Every prune event flushes the dbcache to disk.
By default this happens approximately every ~160MiB so high dbcache values are negated and IBD takes far longer than without pruning enabled.
This change allows a 'high water mark' for pruning such that the actual size of blk/rev on disk can increase a reasonable amount before flushing.
On a machine with prune=550 and dbcache=3000:
I haven't changed the 'diff' column in debug log (it could perhaps be hwm - actual rather than target - actual).
Not sure if this could potentially increase disk space requirements in some cases - may need documentation. With a very high dbcache value, if say 10GiB of blocks come in that only produce 2GiB of chainstate then you'd overshoot quite a bit, I think. It's a tradeoff - more frequent flushing = slower IBD.
Thanks to sipa and gmaxwell for helping out on IRC.