Don't wipe coins cache when full and instead evict LRU clean entries #31102

andrewtoth · 2024-10-16T14:56:23Z

Depends on #28939.

This only wipes the coins cache if memory usage is greater than total allowed cache size. Instead, it does a non-wiping sync to disk and keeps all unspent coins in the cache. These are tracked in a linked list of clean entries, and when spent are removed from the clean linked list and appended to the dirty linked list. This results in the head of the clean linked list containing the oldest clean entry.

When the cache grows to above the large size threshold, clean entries beginning from the head of the linked list are evicted until the cache is below the threshold. This lets the cache maintain a size at or below the large threshold as long as clean entries exist in the cache. When there are no more clean entries, the non-wiping sync removes all spent entries and cleans all non-spent entries. This reduces the size of the cache and allows the previously dirty unspent entries to be evicted as needed.

The pool allocator is modified to add an accounting field to track the amount of free bytes. However, if the in-memory footprint of the cache grows larger than allowed it will wipe and reallocate the cache. This will happen if the mempool fills up and the cache was using the unused mempool space. It will also be reallocated when resizing caches during assumeutxo syncing.

That way it becomes usable in other code like allocators/pool

That way all containers that use the pool have accurate memory tracking. Add test to show memory is accurately tracked, even when nodes can't be allocated by the pool. The more accurate memory estimation interferes shows that our memory estimation for Windows is off, as the real allocated memory is much higher. This adapts the memusage_test test so it still works with the more correct estimation.

DrahtBot · 2024-10-16T14:56:25Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30906 (refactor: prohibit direct flags access in CCoinsCacheEntry and remove invalid tests by l0rinc)
#30849 (refactor: migrate bool GetCoin to return optional<Coin> by l0rinc)
#30673 (coins: remove logic for spent-and-FRESH cache entries and writing non-DIRTY entries by andrewtoth)
#30611 (validation: write chainstate to disk every hour by andrewtoth)
#30610 (validation: do not wipe utxo cache for stats/scans/snapshots by sipa)
#29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
#28531 (improve MallocUsage() accuracy by LarryRuane)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

DrahtBot · 2024-10-16T16:28:59Z

🚧 At least one of the CI tasks failed.
_{Debug: https://github.com/bitcoin/bitcoin/runs/31623076099}

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

Possibly due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.
A sanitizer issue, which can only be found by compiling with the sanitizer and running the
affected test.
An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

l0rinc

Since this depends on other PRs, would it make sense to draft it until those are merged and CI fixed?
Edit: dependent PR was closed, I'll review it here instead

l0rinc · 2024-10-16T16:29:51Z

src/coins.cpp

@@ -42,6 +42,10 @@ size_t CCoinsViewCache::DynamicMemoryUsage() const {
    return memusage::DynamicUsage(cacheCoins) + cachedCoinsUsage;
 }

+size_t CCoinsViewCache::DynamicMemoryAvailableSpace() const {


nit: can we unify the formatting here?
edit: also, std::size_t vs size_t vs 64 bit and signed and unsigned conversions ... can we unify these in a separate commit or pr to avoid distractioins and compiler failures

l0rinc · 2024-10-16T16:36:11Z

src/validation.h

@@ -484,7 +484,9 @@ class CoinsViews {
 enum class CoinsCacheSizeState
 {
    //! The coins cache is in immediate need of a flush.
-    CRITICAL = 2,
+    CRITICAL = 3,
+    //! The coins cache is at >= 99% capacity.


it seemed to me that often we're only only checking the memory after we're finished with certain batches.
99% seems to close to 100% and I'm concerned the second one may be triggered while we're handling the first one somehow.
Would it be safer to lower this to 95%?

l0rinc · 2024-10-16T16:37:23Z

src/coins.h

+    inline void RemoveFromLinkedList() noexcept
+    {
+        if (m_next == nullptr) return;
+            m_next->second.m_prev = m_prev;


formatting is really confusing here (looks like it's part of the condition, but there's a semicolon)

l0rinc · 2024-10-16T23:04:32Z

While the build apparently hasn't decided how to handle casts, the benchmarks are looking quite promising already:

benchmark until 400k

hyperfine \
--runs 1 \
--export-json /mnt/my_storage/ibd_full-prune.json \
--parameter-list COMMIT 48cf3da636089873ba7280e0d5b22eb81811d194,0391844fa123a47ddf479e865f9eec63f28a4c6c \
--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=400000 -dbcache=1000 -printtoconsole=0'

Benchmark 1: COMMIT=48cf3da636089873ba7280e0d5b22eb81811d194 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=400000 -dbcache=1000 -printtoconsole=0
  Time (abs ≡):        3234.108 s               [User: 2907.369 s, System: 286.357 s]

Benchmark 2: COMMIT=0391844fa123a47ddf479e865f9eec63f28a4c6c ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=400000 -dbcache=1000 -printtoconsole=0
  Time (abs ≡):        2877.113 s               [User: 2945.860 s, System: 247.616 s]

Summary
  'COMMIT=0391844fa123a47ddf479e865f9eec63f28a4c6c ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=400000 -dbcache=1000 -printtoconsole=0' ran
    1.12 times faster than 'COMMIT=48cf3da636089873ba7280e0d5b22eb81811d194 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=400000 -dbcache=1000 -printtoconsole=0'

i.e. until 400k blocks we may be 12% faster. Will run more thorough benches later.

edit:

benchmark until 500k

hyperfine --runs 1 --export-json /mnt/my_storage/ibd_full-prune.json --parameter-list COMMIT 48cf3da636089873ba7280e0d5b22eb81811d194,590ffa2508460d7dc6c528b69889c4d696d8147d  --prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' 'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=500000 -dbcache=2000 -printtoconsole=0'

Benchmark 1: COMMIT=48cf3da636089873ba7280e0d5b22eb81811d194 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=500000 -dbcache=2000 -printtoconsole=0
  Time (abs ≡):        6948.336 s               [User: 6962.943 s, System: 720.546 s]

Benchmark 2: COMMIT=590ffa2508460d7dc6c528b69889c4d696d8147d ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=500000 -dbcache=2000 -printtoconsole=0
  Time (abs ≡):        7706.105 s               [User: 7380.817 s, System: 696.713 s]

Summary
  'COMMIT=48cf3da636089873ba7280e0d5b22eb81811d194 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=500000 -dbcache=2000 -printtoconsole=0' ran
    1.11 times faster than 'COMMIT=590ffa2508460d7dc6c528b69889c4d696d8147d ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=500000 -dbcache=2000 -printtoconsole=0'

i.e. 11% slower for 500k and 2000 dbcache :/. We will have to find out when it's faster and when it's slower.

andrewtoth · 2024-10-22T15:11:29Z

Not sure this approach is worth it anymore. For a full IBD I got about 2% faster. It seems that keeping non-dirty entries in the cache to be read from if spent is not that much benefit, since the likelihood of one of those entries being spent before evicted is not high enough.

I think #31132 is a better approach, because cache misses are fetched in parallel much faster so the misses are not as important.

I still think #28939 should be revived though. It is important to also track the memory used outside of the chunks. Right now that memory is not accounted for and for a large bucket size the memory used is significant.

l0rinc · 2025-01-02T11:30:35Z

I think #31132 is a better approach, because cache misses are fetched in parallel much faster so the misses are not as important.

I see the two changes as orthogonal - the main goal of this change as far as I understand is to avoid ReallocateCache calls (deleting unlikely entries to avoid filling the cache), while the #31132 is mostly about cache warming (adding likely ones to the cache in parallel).

But as you've suggested, let's start with reviving #28939 - I'll separate it to a new PR.

l0rinc · 2025-02-25T20:38:08Z

@andrewtoth, can we save the first few memusage: related commits from here at least?
Do we need any change to them or can we just extract to a new PR?

martinus added 2 commits October 16, 2024 06:09

memusage: move MallocUsage into separate file

42a68d4

That way it becomes usable in other code like allocators/pool

DrahtBot changed the title ~~Don't wipe coins cache when full and instead evict LRU clean entries~~ Don't wipe coins cache when full and instead evict LRU clean entries Oct 16, 2024

andrewtoth force-pushed the prune-cache branch from e0446b3 to 0391844 Compare October 16, 2024 16:28

DrahtBot added the CI failed label Oct 16, 2024

l0rinc reviewed Oct 16, 2024

View reviewed changes

andrewtoth added 5 commits October 16, 2024 19:11

memusage: track available memory in pool allocator free list

49df654

memusage: get available memory usage in coins cache

327ae18

memusage: add extra large cache size state and threshold getters

c592611

coins: track clean cache entries in clean linked list

dc3e3c1

coins: evict LRU clean entries when cache is large

590ffa2

andrewtoth force-pushed the prune-cache branch from 0391844 to 590ffa2 Compare October 16, 2024 23:11

l0rinc mentioned this pull request Oct 17, 2024

memusage: let PoolResource keep track of all allocated/deallocated memory #28939

Closed

DrahtBot mentioned this pull request Oct 17, 2024

improve MallocUsage() accuracy #28531

Draft

andrewtoth marked this pull request as draft October 18, 2024 00:10

andrewtoth closed this Oct 22, 2024

l0rinc mentioned this pull request Jan 4, 2025

sync: improve CCoinsViewCache ReallocateCache - 2nd try #30370

Closed

l0rinc mentioned this pull request Mar 12, 2025

[IBD] Tracking PR for speeding up Initial Block Download #32043

Draft

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't wipe coins cache when full and instead evict LRU clean entries #31102

Don't wipe coins cache when full and instead evict LRU clean entries #31102

Uh oh!

andrewtoth commented Oct 16, 2024

Uh oh!

DrahtBot commented Oct 16, 2024 •

edited

Loading

Uh oh!

DrahtBot commented Oct 16, 2024

Uh oh!

l0rinc left a comment •

edited

Loading

Uh oh!

l0rinc Oct 16, 2024 •

edited

Loading

Uh oh!

l0rinc Oct 16, 2024 •

edited

Loading

Uh oh!

l0rinc Oct 16, 2024 •

edited

Loading

Uh oh!

l0rinc commented Oct 16, 2024 •

edited

Loading

Uh oh!

andrewtoth commented Oct 22, 2024

Uh oh!

l0rinc commented Jan 2, 2025

Uh oh!

l0rinc commented Feb 25, 2025

Uh oh!

Uh oh!

Don't wipe coins cache when full and instead evict LRU clean entries #31102

Don't wipe coins cache when full and instead evict LRU clean entries #31102

Uh oh!

Conversation

andrewtoth commented Oct 16, 2024

Uh oh!

DrahtBot commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage

Reviews

Conflicts

Uh oh!

DrahtBot commented Oct 16, 2024

Uh oh!

l0rinc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l0rinc commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewtoth commented Oct 22, 2024

Uh oh!

l0rinc commented Jan 2, 2025

Uh oh!

l0rinc commented Feb 25, 2025

Uh oh!

Uh oh!

DrahtBot commented Oct 16, 2024 •

edited

Loading

l0rinc left a comment •

edited

Loading

l0rinc Oct 16, 2024 •

edited

Loading

l0rinc Oct 16, 2024 •

edited

Loading

l0rinc Oct 16, 2024 •

edited

Loading

l0rinc commented Oct 16, 2024 •

edited

Loading