Skip to content

Conversation

andrewtoth
Copy link
Contributor

Depends on #28939.

This only wipes the coins cache if memory usage is greater than total allowed cache size. Instead, it does a non-wiping sync to disk and keeps all unspent coins in the cache. These are tracked in a linked list of clean entries, and when spent are removed from the clean linked list and appended to the dirty linked list. This results in the head of the clean linked list containing the oldest clean entry.

When the cache grows to above the large size threshold, clean entries beginning from the head of the linked list are evicted until the cache is below the threshold. This lets the cache maintain a size at or below the large threshold as long as clean entries exist in the cache. When there are no more clean entries, the non-wiping sync removes all spent entries and cleans all non-spent entries. This reduces the size of the cache and allows the previously dirty unspent entries to be evicted as needed.

The pool allocator is modified to add an accounting field to track the amount of free bytes. However, if the in-memory footprint of the cache grows larger than allowed it will wipe and reallocate the cache. This will happen if the mempool fills up and the cache was using the unused mempool space. It will also be reallocated when resizing caches during assumeutxo syncing.

That way it becomes usable in other code like allocators/pool
That way all containers that use the pool have accurate memory tracking.

Add test to show memory is accurately tracked, even when nodes can't be allocated by the pool.
The more accurate memory estimation interferes shows that our memory estimation for Windows
is off, as the real allocated memory is much higher. This adapts the memusage_test test so it still
works with the more correct estimation.
@DrahtBot
Copy link
Contributor

DrahtBot commented Oct 16, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30906 (refactor: prohibit direct flags access in CCoinsCacheEntry and remove invalid tests by l0rinc)
  • #30849 (refactor: migrate bool GetCoin to return optional<Coin> by l0rinc)
  • #30673 (coins: remove logic for spent-and-FRESH cache entries and writing non-DIRTY entries by andrewtoth)
  • #30611 (validation: write chainstate to disk every hour by andrewtoth)
  • #30610 (validation: do not wipe utxo cache for stats/scans/snapshots by sipa)
  • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
  • #28531 (improve MallocUsage() accuracy by LarryRuane)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot DrahtBot changed the title Don't wipe coins cache when full and instead evict LRU clean entries Don't wipe coins cache when full and instead evict LRU clean entries Oct 16, 2024
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/31623076099

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Copy link
Contributor

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this depends on other PRs, would it make sense to draft it until those are merged and CI fixed?
Edit: dependent PR was closed, I'll review it here instead

@@ -42,6 +42,10 @@ size_t CCoinsViewCache::DynamicMemoryUsage() const {
return memusage::DynamicUsage(cacheCoins) + cachedCoinsUsage;
}

size_t CCoinsViewCache::DynamicMemoryAvailableSpace() const {
Copy link
Contributor

@l0rinc l0rinc Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we unify the formatting here?
edit: also, std::size_t vs size_t vs 64 bit and signed and unsigned conversions ... can we unify these in a separate commit or pr to avoid distractioins and compiler failures

@@ -484,7 +484,9 @@ class CoinsViews {
enum class CoinsCacheSizeState
{
//! The coins cache is in immediate need of a flush.
CRITICAL = 2,
CRITICAL = 3,
//! The coins cache is at >= 99% capacity.
Copy link
Contributor

@l0rinc l0rinc Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seemed to me that often we're only only checking the memory after we're finished with certain batches.
99% seems to close to 100% and I'm concerned the second one may be triggered while we're handling the first one somehow.
Would it be safer to lower this to 95%?

src/coins.h Outdated
inline void RemoveFromLinkedList() noexcept
{
if (m_next == nullptr) return;
m_next->second.m_prev = m_prev;
Copy link
Contributor

@l0rinc l0rinc Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting is really confusing here (looks like it's part of the condition, but there's a semicolon)

@l0rinc
Copy link
Contributor

l0rinc commented Oct 16, 2024

While the build apparently hasn't decided how to handle casts, the benchmarks are looking quite promising already:

benchmark until 400k
hyperfine \
--runs 1 \
--export-json /mnt/my_storage/ibd_full-prune.json \
--parameter-list COMMIT 48cf3da636089873ba7280e0d5b22eb81811d194,0391844fa123a47ddf479e865f9eec63f28a4c6c \
--prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' \
'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=400000 -dbcache=1000 -printtoconsole=0'
Benchmark 1: COMMIT=48cf3da636089873ba7280e0d5b22eb81811d194 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=400000 -dbcache=1000 -printtoconsole=0
  Time (abs ≡):        3234.108 s               [User: 2907.369 s, System: 286.357 s]

Benchmark 2: COMMIT=0391844fa123a47ddf479e865f9eec63f28a4c6c ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=400000 -dbcache=1000 -printtoconsole=0
  Time (abs ≡):        2877.113 s               [User: 2945.860 s, System: 247.616 s]

Summary
  'COMMIT=0391844fa123a47ddf479e865f9eec63f28a4c6c ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=400000 -dbcache=1000 -printtoconsole=0' ran
    1.12 times faster than 'COMMIT=48cf3da636089873ba7280e0d5b22eb81811d194 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=400000 -dbcache=1000 -printtoconsole=0'

i.e. until 400k blocks we may be 12% faster. Will run more thorough benches later.


edit:

benchmark until 500k
hyperfine --runs 1 --export-json /mnt/my_storage/ibd_full-prune.json --parameter-list COMMIT 48cf3da636089873ba7280e0d5b22eb81811d194,590ffa2508460d7dc6c528b69889c4d696d8147d  --prepare 'rm -rf /mnt/my_storage/BitcoinData/* && git checkout {COMMIT} && git clean -fxd && git reset --hard && cmake -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_UTIL=OFF -DBUILD_TX=OFF -DBUILD_TESTS=OFF -DENABLE_WALLET=OFF -DINSTALL_MAN=OFF && cmake --build build -j$(nproc)' 'COMMIT={COMMIT} ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=500000 -dbcache=2000 -printtoconsole=0'
Benchmark 1: COMMIT=48cf3da636089873ba7280e0d5b22eb81811d194 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=500000 -dbcache=2000 -printtoconsole=0
  Time (abs ≡):        6948.336 s               [User: 6962.943 s, System: 720.546 s]

Benchmark 2: COMMIT=590ffa2508460d7dc6c528b69889c4d696d8147d ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=500000 -dbcache=2000 -printtoconsole=0
  Time (abs ≡):        7706.105 s               [User: 7380.817 s, System: 696.713 s]

Summary
  'COMMIT=48cf3da636089873ba7280e0d5b22eb81811d194 ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=500000 -dbcache=2000 -printtoconsole=0' ran
    1.11 times faster than 'COMMIT=590ffa2508460d7dc6c528b69889c4d696d8147d ./build/src/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=500000 -dbcache=2000 -printtoconsole=0'

i.e. 11% slower for 500k and 2000 dbcache :/. We will have to find out when it's faster and when it's slower.

@andrewtoth
Copy link
Contributor Author

Not sure this approach is worth it anymore. For a full IBD I got about 2% faster. It seems that keeping non-dirty entries in the cache to be read from if spent is not that much benefit, since the likelihood of one of those entries being spent before evicted is not high enough.

I think #31132 is a better approach, because cache misses are fetched in parallel much faster so the misses are not as important.

I still think #28939 should be revived though. It is important to also track the memory used outside of the chunks. Right now that memory is not accounted for and for a large bucket size the memory used is significant.

@andrewtoth andrewtoth closed this Oct 22, 2024
@l0rinc
Copy link
Contributor

l0rinc commented Jan 2, 2025

I think #31132 is a better approach, because cache misses are fetched in parallel much faster so the misses are not as important.

I see the two changes as orthogonal - the main goal of this change as far as I understand is to avoid ReallocateCache calls (deleting unlikely entries to avoid filling the cache), while the #31132 is mostly about cache warming (adding likely ones to the cache in parallel).

But as you've suggested, let's start with reviving #28939 - I'll separate it to a new PR.

@l0rinc
Copy link
Contributor

l0rinc commented Feb 25, 2025

@andrewtoth, can we save the first few memusage: related commits from here at least?
Do we need any change to them or can we just extract to a new PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants