-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Use number of dirty cache entries in flush warnings/logs #31703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31703. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Nice, Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
19f1f69
to
4d7c848
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
4d7c848
to
d99b0ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
I have tried reproducing it via AssumeUTXO but the warning is never triggered.
That's my main concern (it's probably my mistake in the related PR), other than not applying it in other cases where the sizes are displayed.
src/validation.cpp
Outdated
// Typical Coin structures on disk are around 48 bytes in size. | ||
size_t approx_disk_usage = 48 * coins_dirty_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the hard-coded value was here before, and was wondering if we could explain how we got to this magic number (and not mention the value in comments at all since code can express that better), maybe something like:
static constexpr size_t COIN_DISK_USAGE_BYTES{sizeof(Coin)}; // Approximate size on disk
Loading the UTXO set to memory:
2025-01-22T11:56:22Z [snapshot] loaded 176948713 (27440 MB) coins from snapshot 0000000000000000000320283a032748cef8227873ff4872689bf23f1cda83a5
and checking the resulting disk size:
du -sk /Users/lorinc/IdeaProjects/bitcoin/demo/chainstate_snapshot | cut -f1
11524472
Suggests we may want to add a ~40% overhead here:
((11524472*1024)/176948713)/48
i.e.
static constexpr size_t COIN_DISK_USAGE_BYTES{static_cast<size_t>(sizeof(Coin) * 1.4)}; // Approximate size on disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Estimating sizes of the UTXO set is hard, because these are all very distinct numbers:
- The size on disk in a snapshot of a UTXO set
- The size on disk in the chainstate LevelDB database (the database has different per-entry overhead, prefix compression, bloom filters, ...).
- The amount of data that will be written to disk to flush a number of dirty entries (because e.g. writing a removal/spending of a UTXO temporarily takes up space in the log before it's compacted into the tables)
- The amount of memory a UTXO entry takes up in RAM (as approximated by
DynamicMemoryUsage()
).
If we actually care about disk sizes, we should actually try to estimate the thing we care about, and not use snapshot size as a proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that it's complex, but we seem to be off here by 40% - I don't mind if we tackle that in a separate PR/issue and would appreciate your opinion on how to handle this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you're looking at snapshot size at all, that's not a good measure of data written. We may be off by a lot more, or a lot less. If we want to do better, we should look at how much is actually written, and then add to CCoinsViewCache a way to predict that number. None of that seems a good fit for this PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you're looking at snapshot size
not a good measure of data written
and not use snapshot size as a proxy
I understand that the dirty entries in the snapshot are not a perfect representation (except for the AssumeUTXO tests I was doing, which likely skewed my recommendations), but I though it's what we're already doing here when we're calculating "approx_disk_usage" from "coins_dirty_count".
None of that seems a good fit for this PR, though
I agree, if we want a better representation. But replacing the magic 48
with sizeof(Coin)
could be. But I'm also fine if not, it's just an alternative that I found.
src/coins.cpp
Outdated
@@ -235,6 +244,7 @@ bool CCoinsViewCache::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &ha | |||
itUs->second.coin = it->second.coin; | |||
} | |||
cachedCoinsUsage += itUs->second.coin.DynamicMemoryUsage(); | |||
m_num_dirty += !itUs->second.IsDirty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to understand this one - still not sure I follow completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworked it a bit. LMK if it's clearer now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you.
The only remaining nit is that in other cases we set it dirty first, before incrementing (edit: to obviate that the call doesn't rely on the value of m_dirty_count
).
src/coins.cpp
Outdated
@@ -78,6 +78,7 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi | |||
bool fresh = false; | |||
if (!inserted) { | |||
cachedCoinsUsage -= it->second.coin.DynamicMemoryUsage(); | |||
m_num_dirty -= it->second.IsDirty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're modifying m_num_dirty
multiple times in the AddCoin
method, possibly even cancelling out previous modifications. If it's not too involved, could we calculate the final diff and only update it a single time in this method for clarity? Same applies to SpendCoin
.
src/coins.cpp
Outdated
@@ -343,6 +359,7 @@ void CCoinsViewCache::SanityCheck() const | |||
} | |||
assert(count_linked == count_flagged); | |||
assert(recomputed_usage == cachedCoinsUsage); | |||
assert(num_dirty == m_num_dirty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we merge I'd like to run a full IBD with SanityCheck
enabled throughout the code - can you recommend other settings that I should enable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that involves a reorg would perhaps be interesting to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran an IBD until 800k with -DCMAKE_BUILD_TYPE=Debug
for the current PR (it's really slow this way) - no issues found.
a reorg would perhaps be interesting to test
Do we have a way of triggering random reorgs during IBD (at least undo/redo style)?
If not, would it be helpful to create a PR that adds a hidden -simulatereorgonibd
argument (similarly to -dbcrashratio
) that randomly triggers small, temporary reorgs during IBD on mainnet
? Specifically, at a random height between every ~[10,000, 100,000]
blocks, we disconnect and reconnect ~[1, 100]
blocks (favoring smaller counts), performing a CoinsTip#SanityCheck()
before and after (and maybe validating that we have the UTXOs in cache that will be removed, asserting after undo that they're missing, and revalidating after reapplication that they're back in cache). This would test reorg logic in a semi-realistic scenario - though it wouldn't stress diverging chains, as it reconnects the same blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could collect a few hundred stale blocks (found 219 raw blocks so far via https://github.com/bitcoin-data/stale-blocks and https://learnmeabitcoin.com) and create a stale-block-proxy node for end-to-end regression testing of reorgs with historical data:
- A freshly built test node would connect only to this proxy, which would simulate the original chain forks.
- It serves headers & blocks exactly as they were originally seen (without storing them, requesting non-stale ones from the network).
- The proxy pretends a stale block is the tip, letting the test node sync up to it.
- Once the test node reaches that stale block, the proxy sets the next stale block as the new tip, triggering a natural reorg.
Would that work and be a useful way to test reorg behavior during IBD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this later to test reorg behavior in a functional test, you can resolve this comment.
src/validation.cpp
Outdated
// Typical Coin structures on disk are around 48 bytes in size. | ||
size_t approx_disk_usage = 48 * coins_dirty_count; | ||
if (approx_disk_usage >= WARN_FLUSH_COINS_SIZE) LogWarning("Flushing large (%.1f GiB) UTXO set to disk, it may take several minutes", double(approx_disk_usage) / (1 << 30)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely my mistake in #31534, but loading the UTXO set (as described in #31645) will call Flush
via FlushSnapshotToDisk
at the end (taking a really long time but never warning) https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L5878, reset m_num_dirty = 0
so when we get to FlushStateToDisk
(through ActivateSnapshot
) we're never warning since coins_dirty_count
is always 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the snapshot code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted the steps to reproduce it at #31645 - but we can move this warning to Flush
in a different PR to make sure it works for AssumeUTXO
as well.
Checked that the current PR displays the warning correctly.
2025-01-22T19:20:54Z UpdateTip: new best=0000000000000000000162ce2492545600d6e6780ee22bf464166a4faa28ba28 height=824665 version=0x32000000 log2_work=94.650129 tx=948674213 date='2024-01-06T21:32:54Z' progress=0.822859 cache=9687.2MiB(67719102txo)
...
2025-01-22T19:20:55Z [warning] Flushing large (67721262 entries) UTXO set to disk, it may take several minutes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with the snapshot code.
It seems to me that it may make more sense to move the warning to CCoinsViewDB::BatchWrite
instead, which could already have m_dirty
available through the CoinsViewCacheCursor
, and since it's called from Flush
and Sync
, an AssumeUTXO load-and-dump would also trigger the warning (not just an IBD).
I also don't mind doing it myself in a follow-up, if you think it's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, currently AssumeUTXO doesn't display the warning:
2025-03-18T10:08:47Z [snapshot] 184000000 coins loaded (99.56%, 29332 MB)
2025-03-18T10:08:48Z [snapshot] loaded 184821030 (29456 MB) coins from snapshot 000000000000000000010b17283c3c400507969a9c2afd1dcf2082ec5cca2880
2025-03-18T10:08:48Z FlushSnapshotToDisk: saving snapshot chainstate (29456 MB) started
2025-03-18T10:09:57Z FlushSnapshotToDisk: completed (69461.66ms)
But if we moved the warning introduced in #31534 into to the method that does the actual long operation it would show the warning for AssumeUTXO case as well (i.e. both for Flush and Sync):
diff --git a/src/coins.h b/src/coins.h
index f897bce749..a23b663a1a 100644
--- a/src/coins.h
+++ b/src/coins.h
@@ -300,6 +300,7 @@ struct CoinsViewCacheCursor
}
inline bool WillErase(CoinsCachePair& current) const noexcept { return m_will_erase || current.second.coin.IsSpent(); }
+ size_t GetDirtyCount() const noexcept { return m_dirty; }
private:
size_t& m_usage;
size_t& m_dirty;
diff --git a/src/txdb.cpp b/src/txdb.cpp
index 1622039d63..083cab3279 100644
--- a/src/txdb.cpp
+++ b/src/txdb.cpp
@@ -25,6 +25,9 @@ static constexpr uint8_t DB_HEAD_BLOCKS{'H'};
// Keys used in previous version that might still be found in the DB:
static constexpr uint8_t DB_COINS{'c'};
+// Threshold for warning when writing this many dirty cache entries to disk.
+static constexpr size_t WARN_FLUSH_COINS_COUNT{10'000'000};
+
bool CCoinsViewDB::NeedsUpgrade()
{
std::unique_ptr<CDBIterator> cursor{m_db->NewIterator()};
@@ -109,6 +112,8 @@ bool CCoinsViewDB::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &hashB
}
}
+ if (cursor.GetDirtyCount() > WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", cursor.GetDirtyCount());
+
// In the first batch, mark the database as being in the middle of a
// transition from old_tip to hashBlock.
// A vector is used for future extensibility, as we may want to support
diff --git a/src/validation.cpp b/src/validation.cpp
index 1f4f2e1ae8..3323a037dc 100644
--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -88,8 +88,6 @@ using node::CBlockIndexHeightOnlyComparator;
using node::CBlockIndexWorkComparator;
using node::SnapshotMetadata;
-/** Threshold for warning when writing this many dirty cache entries to disk. */
-static constexpr size_t WARN_FLUSH_COINS_COUNT = 10'000'000;
/** Time to wait between writing blocks/block index to disk. */
static constexpr std::chrono::hours DATABASE_WRITE_INTERVAL{1};
/** Time to wait between flushing chainstate to disk. */
@@ -2932,7 +2930,6 @@ bool Chainstate::FlushStateToDisk(
}
// Flush best chain related state. This can only be done if the blocks / block index write was also done.
if (fDoFullFlush && !CoinsTip().GetBestBlock().IsNull()) {
- if (coins_dirty_count >= WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", coins_dirty_count);
LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d out of %d cached coins)",
coins_dirty_count, coins_count), BCLog::BENCH);
Reproducer:
cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)
mkdir -p demo && rm -rfd demo/chainstate demo/chainstate_snapshot demo/debug.log
build/bin/bitcoind -datadir=demo -stopatheight=1
build/bin/bitcoind -datadir=demo -daemon -blocksonly=1 -connect=0 -dbcache=30000
build/bin/bitcoin-cli -datadir=demo -rpcclienttimeout=0 loadtxoutset ~/utxo-880000.dat
build/bin/bitcoin-cli -datadir=demo stop
Result:
2025-03-18T09:59:40Z [snapshot] 184000000 coins loaded (99.56%, 29332 MB)
2025-03-18T09:59:41Z [snapshot] loaded 184821030 (29456 MB) coins from snapshot 000000000000000000010b17283c3c400507969a9c2afd1dcf2082ec5cca2880
2025-03-18T09:59:41Z FlushSnapshotToDisk: saving snapshot chainstate (29456 MB) started
*2025-03-18T09:59:41Z [warning] Flushing large (184821030 entries) UTXO set to disk, it may take several minutes*
2025-03-18T10:00:50Z FlushSnapshotToDisk: completed (68776.27ms)
Would it be simpler if I pushed this change in a separate PR instead?
@@ -2828,7 +2828,8 @@ bool Chainstate::FlushStateToDisk( | |||
bool full_flush_completed = false; | |||
|
|||
const size_t coins_count = CoinsTip().GetCacheSize(); | |||
const size_t coins_mem_usage = CoinsTip().DynamicMemoryUsage(); | |||
[[maybe_unused]] const size_t coins_mem_usage = CoinsTip().DynamicMemoryUsage(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we can't just inline this, but maybe we can guard it with #ifdef ENABLE_TRACING
alternatively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think [[maybe_unused]]
is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, though this won't warn us if it actually becomes unused.
56fef1e
to
1c89256
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
1c89256
to
15619a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK for now.
Makes sense to ensure warnings use a more accurate data point (m_dirty_count
). Helps prevent a "chicken little" effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed it in multiple steps, I have a few remaining questions and concerns, let me know if I can help in any other way.
@@ -100,6 +100,7 @@ class CCoinsViewCacheTest : public CCoinsViewCache | |||
CCoinsMap& map() const { return cacheCoins; } | |||
CoinsCachePair& sentinel() const { return m_sentinel; } | |||
size_t& usage() const { return cachedCoinsUsage; } | |||
size_t& GetDirtyCount() const { return m_dirty_count; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now hides GetDirtyCount
in CCoinsViewCache
with a size_t
return value (changed to size_t&
here), used later as cache.usage() +=
and cache.GetDirtyCount() +=
.
To avoid the confusing override and the method return value assignment, please consider making them explicit setters/adders:
void AddUsage(size_t usage) const { cachedCoinsUsage += usage; }
void AddDirtyCount(size_t count) const { m_dirty_count += count; }
cache.AddUsage(InsertCoinsMapEntry(cache.map(), cache.sentinel(), *cache_coin));
cache.AddDirtyCount(cache_coin->IsDirty());
/** Size threshold for warning about slow UTXO set flush to disk. */ | ||
static constexpr size_t WARN_FLUSH_COINS_SIZE = 1 << 30; // 1 GiB | ||
/** Threshold for warning when writing this many dirty cache entries to disk. */ | ||
static constexpr size_t WARN_FLUSH_COINS_COUNT = 10'000'000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 10m seems like a good approximation for the previous value:
2025-01-31T09:45:42Z UpdateTip: new best=0000000000000000cc22cc938f92ecc3a3844f23775c4ad4e66404563c8594c8 height=287000 version=0x00000002 log2_work=76.752105 tx=33340892 date='2014-02-2
1T05:18:42Z' progress=0.028794 cache=1245.8MiB(10004443txo)
@@ -113,7 +115,10 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi | |||
void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't seem to be covered by coins_tests
at all.
Can we extend SimulationTest
to call stack.back()->EmplaceCoinInternalDANGER
instead of AddCoin
randomly?
Maybe something like:
if (COutPoint op(txid, 0); !stack.back()->map().contains(op) && !(coin.IsSpent() || (m_rng.rand32() & 1))) {
stack.back()->EmplaceCoinInternalDANGER(std::move(op), std::move(newcoin));
} else {
stack.back()->AddCoin(std::move(op), std::move(newcoin), /*possible_overwrite=*/(!coin.IsSpent() || (m_rng.rand32() & 1)));
}
@@ -113,7 +115,10 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi | |||
void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) { | |||
cachedCoinsUsage += coin.DynamicMemoryUsage(); | |||
auto [it, inserted] = cacheCoins.try_emplace(std::move(outpoint), std::move(coin)); | |||
if (inserted) CCoinsCacheEntry::SetDirty(*it, m_sentinel); | |||
if (inserted) { | |||
CCoinsCacheEntry::SetDirty(*it, m_sentinel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation of EmplaceCoinInternalDANGER
claims that:
Emplace a coin into cacheCoins without performing any checks, marking the emplaced coin as dirty.
How come we're updating cachedCoinsUsage
unconditionally in this method - but m_dirty_count
only when inserted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for validating it, I only noticed it now, when trying to test EmplaceCoinInternalDANGER
in SimulationTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following @andrewtoth's solution we can do:
void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) {
const auto mem_usage{coin.DynamicMemoryUsage()};
auto [it, inserted] = cacheCoins.try_emplace(std::move(outpoint), std::move(coin));
if (inserted) {
cachedCoinsUsage += mem_usage;
CCoinsCacheEntry::SetDirty(*it, m_sentinel);
++m_dirty_count;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After our last discussion (and with the new AssumeUTXO included) I've rebased and re-reviewed and added examples to my suggestions to simplify the review.
My remaining concerns are that AssumeUTXO doesn't warn in the same way as other dumps do, a few leftover refactoring misses, redundant calculation now that we have a dirty count, missing code coverage for modified code, a leftover memory accounting bug in affected area, and a few code nits to make the change less surprising.
Suggestions
diff --git a/src/coins.cpp b/src/coins.cpp
index 30d82e1d08..b126622490 100644
--- a/src/coins.cpp
+++ b/src/coins.cpp
@@ -113,10 +113,11 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi
}
void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) {
- cachedCoinsUsage += coin.DynamicMemoryUsage();
+ const auto mem_usage{coin.DynamicMemoryUsage()};
auto [it, inserted] = cacheCoins.try_emplace(std::move(outpoint), std::move(coin));
if (inserted) {
CCoinsCacheEntry::SetDirty(*it, m_sentinel);
+ cachedCoinsUsage += mem_usage;
++m_dirty_count;
}
}
@@ -245,8 +246,8 @@ bool CCoinsViewCache::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &ha
}
cachedCoinsUsage += itUs->second.coin.DynamicMemoryUsage();
if (!itUs->second.IsDirty()) {
- ++m_dirty_count;
CCoinsCacheEntry::SetDirty(*itUs, m_sentinel);
+ ++m_dirty_count;
}
// NOTE: It isn't safe to mark the coin as FRESH in the parent
// cache. If it already existed and was spent in the parent
@@ -342,7 +343,7 @@ void CCoinsViewCache::SanityCheck() const
// Recompute cachedCoinsUsage.
recomputed_usage += entry.coin.DynamicMemoryUsage();
- // Recompute m_num_dirty;
+ // Recompute m_dirty_count.
dirty_count += entry.IsDirty();
// Count the number of entries we expect in the linked list.
diff --git a/src/coins.h b/src/coins.h
index f897bce749..474cf77fdb 100644
--- a/src/coins.h
+++ b/src/coins.h
@@ -272,11 +272,11 @@ struct CoinsViewCacheCursor
//! Calling CCoinsMap::clear() afterwards is faster because a CoinsCachePair cannot be coerced back into a
//! CCoinsMap::iterator to be erased, and must therefore be looked up again by key in the CCoinsMap before being erased.
CoinsViewCacheCursor(size_t& usage LIFETIMEBOUND,
- size_t& dirty LIFETIMEBOUND,
- CoinsCachePair& sentinel LIFETIMEBOUND,
- CCoinsMap& map LIFETIMEBOUND,
- bool will_erase) noexcept
- : m_usage(usage), m_dirty(dirty), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {}
+ size_t& dirty_count LIFETIMEBOUND,
+ CoinsCachePair& sentinel LIFETIMEBOUND,
+ CCoinsMap& map LIFETIMEBOUND,
+ bool will_erase) noexcept
+ : m_usage(usage), m_dirty_count(dirty_count), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {}
inline CoinsCachePair* Begin() const noexcept { return m_sentinel.second.Next(); }
inline CoinsCachePair* End() const noexcept { return &m_sentinel; }
@@ -285,7 +285,7 @@ struct CoinsViewCacheCursor
inline CoinsCachePair* NextAndMaybeErase(CoinsCachePair& current) noexcept
{
const auto next_entry{current.second.Next()};
- m_dirty -= current.second.IsDirty();
+ m_dirty_count -= current.second.IsDirty();
// If we are not going to erase the cache, we must still erase spent entries.
// Otherwise, clear the state of the entry.
if (!m_will_erase) {
@@ -300,9 +300,11 @@ struct CoinsViewCacheCursor
}
inline bool WillErase(CoinsCachePair& current) const noexcept { return m_will_erase || current.second.coin.IsSpent(); }
+ size_t GetDirtyCount() const noexcept { return m_dirty_count; }
+
private:
size_t& m_usage;
- size_t& m_dirty;
+ size_t& m_dirty_count;
CoinsCachePair& m_sentinel;
CCoinsMap& m_map;
bool m_will_erase;
diff --git a/src/test/coins_tests.cpp b/src/test/coins_tests.cpp
index 261859ff9a..0cef222024 100644
--- a/src/test/coins_tests.cpp
+++ b/src/test/coins_tests.cpp
@@ -99,8 +99,8 @@ public:
CCoinsMap& map() const { return cacheCoins; }
CoinsCachePair& sentinel() const { return m_sentinel; }
- size_t& usage() const { return cachedCoinsUsage; }
- size_t& GetDirtyCount() const { return m_dirty_count; }
+ void AddUsage(size_t usage) const { cachedCoinsUsage += usage; }
+ void AddDirtyCount(size_t count) const { m_dirty_count += count; }
};
} // namespace
@@ -197,8 +197,11 @@ void SimulationTest(CCoinsView* base, bool fake_best_block)
(coin.IsSpent() ? added_an_entry : updated_an_entry) = true;
coin = newcoin;
}
- bool is_overwrite = !coin.IsSpent() || m_rng.rand32() & 1;
- stack.back()->AddCoin(COutPoint(txid, 0), std::move(newcoin), is_overwrite);
+ if (COutPoint op(txid, 0); !stack.back()->map().contains(op) && !(coin.IsSpent() || (m_rng.rand32() & 1))) {
+ stack.back()->EmplaceCoinInternalDANGER(std::move(op), std::move(newcoin));
+ } else {
+ stack.back()->AddCoin(std::move(op), std::move(newcoin), /*possible_overwrite=*/(!coin.IsSpent() || (m_rng.rand32() & 1)));
+ }
} else {
// Spend the coin.
removed_an_entry = true;
@@ -653,8 +656,8 @@ static void WriteCoinsViewEntry(CCoinsView& view, const MaybeCoin& cache_coin)
CCoinsMapMemoryResource resource;
CCoinsMap map{0, CCoinsMap::hasher{}, CCoinsMap::key_equal{}, &resource};
auto usage{cache_coin ? InsertCoinsMapEntry(map, sentinel, *cache_coin) : 0};
- size_t dirty = cache_coin ? cache_coin->IsDirty() : 0;
- auto cursor{CoinsViewCacheCursor(usage, dirty, sentinel, map, /*will_erase=*/true)};
+ size_t dirty_count = cache_coin ? cache_coin->IsDirty() : 0;
+ auto cursor{CoinsViewCacheCursor(usage, dirty_count, sentinel, map, /*will_erase=*/true)};
BOOST_CHECK(view.BatchWrite(cursor, {}));
}
@@ -666,8 +669,8 @@ public:
auto base_cache_coin{base_value == ABSENT ? MISSING : CoinEntry{base_value, CoinEntry::State::DIRTY}};
WriteCoinsViewEntry(base, base_cache_coin);
if (cache_coin) {
- cache.usage() += InsertCoinsMapEntry(cache.map(), cache.sentinel(), *cache_coin);
- cache.GetDirtyCount() += cache_coin->IsDirty();
+ cache.AddUsage(InsertCoinsMapEntry(cache.map(), cache.sentinel(), *cache_coin));
+ cache.AddDirtyCount(cache_coin->IsDirty());
}
}
diff --git a/src/test/fuzz/coins_view.cpp b/src/test/fuzz/coins_view.cpp
index a9b91b9efe..64cec03fd5 100644
--- a/src/test/fuzz/coins_view.cpp
+++ b/src/test/fuzz/coins_view.cpp
@@ -123,7 +123,7 @@ FUZZ_TARGET(coins_view, .init = initialize_coins_view)
CoinsCachePair sentinel{};
sentinel.second.SelfRef(sentinel);
size_t usage{0};
- size_t num_dirty{0};
+ size_t dirty_count{0};
CCoinsMapMemoryResource resource;
CCoinsMap coins_map{0, SaltedOutpointHasher{/*deterministic=*/true}, CCoinsMap::key_equal{}, &resource};
LIMITED_WHILE(good_data && fuzzed_data_provider.ConsumeBool(), 10'000)
@@ -145,11 +145,11 @@ FUZZ_TARGET(coins_view, .init = initialize_coins_view)
if (dirty) CCoinsCacheEntry::SetDirty(*it, sentinel);
if (fresh) CCoinsCacheEntry::SetFresh(*it, sentinel);
usage += it->second.coin.DynamicMemoryUsage();
- num_dirty += dirty;
+ dirty_count += dirty;
}
bool expected_code_path = false;
try {
- auto cursor{CoinsViewCacheCursor(usage, num_dirty, sentinel, coins_map, /*will_erase=*/true)};
+ auto cursor{CoinsViewCacheCursor(usage, dirty_count, sentinel, coins_map, /*will_erase=*/true)};
coins_view_cache.BatchWrite(cursor, fuzzed_data_provider.ConsumeBool() ? ConsumeUInt256(fuzzed_data_provider) : coins_view_cache.GetBestBlock());
expected_code_path = true;
} catch (const std::logic_error& e) {
diff --git a/src/txdb.cpp b/src/txdb.cpp
index 1622039d63..9587a62db7 100644
--- a/src/txdb.cpp
+++ b/src/txdb.cpp
@@ -25,6 +25,9 @@ static constexpr uint8_t DB_HEAD_BLOCKS{'H'};
// Keys used in previous version that might still be found in the DB:
static constexpr uint8_t DB_COINS{'c'};
+// Threshold for warning when writing this many dirty cache entries to disk.
+static constexpr size_t WARN_FLUSH_COINS_COUNT{10'000'000};
+
bool CCoinsViewDB::NeedsUpgrade()
{
std::unique_ptr<CDBIterator> cursor{m_db->NewIterator()};
@@ -93,6 +96,7 @@ std::vector<uint256> CCoinsViewDB::GetHeadBlocks() const {
bool CCoinsViewDB::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &hashBlock) {
CDBBatch batch(*m_db);
size_t count = 0;
+ size_t dirty_count{cursor.GetDirtyCount()};
size_t changed = 0;
assert(!hashBlock.IsNull());
@@ -109,6 +113,8 @@ bool CCoinsViewDB::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &hashB
}
}
+ if (dirty_count > WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", dirty_count);
+
// In the first batch, mark the database as being in the middle of a
// transition from old_tip to hashBlock.
// A vector is used for future extensibility, as we may want to support
@@ -145,9 +151,10 @@ bool CCoinsViewDB::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &hashB
batch.Erase(DB_HEAD_BLOCKS);
batch.Write(DB_BEST_BLOCK, hashBlock);
- LogDebug(BCLog::COINDB, "Writing final batch of %.2f MiB\n", batch.SizeEstimate() * (1.0 / 1048576.0));
+ LogDebug(BCLog::COINDB, "Writing final batch of %.2f MiB", batch.SizeEstimate() * (1.0 / 1048576.0));
bool ret = m_db->WriteBatch(batch);
- LogDebug(BCLog::COINDB, "Committed %u changed transaction outputs (out of %u) to coin database...\n", (unsigned int)changed, (unsigned int)count);
+ assert(changed == dirty_count); // TODO remove `changed` after all scenarios pass
+ LogDebug(BCLog::COINDB, "Committed %u changed transaction outputs (out of %u) to coin database...", (unsigned int)dirty_count, (unsigned int)count);
return ret;
}
diff --git a/src/validation.cpp b/src/validation.cpp
index 1f4f2e1ae8..3323a037dc 100644
--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -88,8 +88,6 @@ using node::CBlockIndexHeightOnlyComparator;
using node::CBlockIndexWorkComparator;
using node::SnapshotMetadata;
-/** Threshold for warning when writing this many dirty cache entries to disk. */
-static constexpr size_t WARN_FLUSH_COINS_COUNT = 10'000'000;
/** Time to wait between writing blocks/block index to disk. */
static constexpr std::chrono::hours DATABASE_WRITE_INTERVAL{1};
/** Time to wait between flushing chainstate to disk. */
@@ -2932,7 +2930,6 @@ bool Chainstate::FlushStateToDisk(
}
// Flush best chain related state. This can only be done if the blocks / block index write was also done.
if (fDoFullFlush && !CoinsTip().GetBestBlock().IsNull()) {
- if (coins_dirty_count >= WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", coins_dirty_count);
LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d out of %d cached coins)",
coins_dirty_count, coins_count), BCLog::BENCH);
@@ -272,10 +272,11 @@ struct CoinsViewCacheCursor | |||
//! Calling CCoinsMap::clear() afterwards is faster because a CoinsCachePair cannot be coerced back into a | |||
//! CCoinsMap::iterator to be erased, and must therefore be looked up again by key in the CCoinsMap before being erased. | |||
CoinsViewCacheCursor(size_t& usage LIFETIMEBOUND, | |||
size_t& dirty LIFETIMEBOUND, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: parameters slightly unaligned with first parameter
CoinsCachePair& sentinel LIFETIMEBOUND, | ||
CCoinsMap& map LIFETIMEBOUND, | ||
bool will_erase) noexcept | ||
: m_usage(usage), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {} | ||
: m_usage(usage), m_dirty(dirty), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need to track CCoinsViewDB::BatchWrite#changed
now that we're tracking dirty count already?
I think we should be able to use the new m_dirty(_count)
there instead, i.e. something like:
size_t dirty_count{cursor.GetDirtyCount()};
size_t changed{0};
...
if (dirty_count > WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", dirty_count);
...
assert(changed == dirty_count); // TODO remove `changed` after all scenarios pass
LogDebug(BCLog::COINDB, "Committed %u changed transaction outputs (out of %u) to coin database...", (unsigned int)dirty_count, (unsigned int)count);
For me locally this passes all tests, quick IBD (ran it with -dbcache=5 -maxmempool=5 -stopatheight=120000
), reindex and AssumeUTXO
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be useful to avoid the needless DB_HEAD_BLOCKS erase and rewrite for empty batches.
@@ -327,6 +342,9 @@ void CCoinsViewCache::SanityCheck() const | |||
// Recompute cachedCoinsUsage. | |||
recomputed_usage += entry.coin.DynamicMemoryUsage(); | |||
|
|||
// Recompute m_num_dirty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover num_dirty
(& other comments here end with fullstop)
// Recompute m_num_dirty; | |
// Recompute m_dirty_count. |
src/validation.cpp
Outdated
// Typical Coin structures on disk are around 48 bytes in size. | ||
size_t approx_disk_usage = 48 * coins_dirty_count; | ||
if (approx_disk_usage >= WARN_FLUSH_COINS_SIZE) LogWarning("Flushing large (%.1f GiB) UTXO set to disk, it may take several minutes", double(approx_disk_usage) / (1 << 30)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, currently AssumeUTXO doesn't display the warning:
2025-03-18T10:08:47Z [snapshot] 184000000 coins loaded (99.56%, 29332 MB)
2025-03-18T10:08:48Z [snapshot] loaded 184821030 (29456 MB) coins from snapshot 000000000000000000010b17283c3c400507969a9c2afd1dcf2082ec5cca2880
2025-03-18T10:08:48Z FlushSnapshotToDisk: saving snapshot chainstate (29456 MB) started
2025-03-18T10:09:57Z FlushSnapshotToDisk: completed (69461.66ms)
But if we moved the warning introduced in #31534 into to the method that does the actual long operation it would show the warning for AssumeUTXO case as well (i.e. both for Flush and Sync):
diff --git a/src/coins.h b/src/coins.h
index f897bce749..a23b663a1a 100644
--- a/src/coins.h
+++ b/src/coins.h
@@ -300,6 +300,7 @@ struct CoinsViewCacheCursor
}
inline bool WillErase(CoinsCachePair& current) const noexcept { return m_will_erase || current.second.coin.IsSpent(); }
+ size_t GetDirtyCount() const noexcept { return m_dirty; }
private:
size_t& m_usage;
size_t& m_dirty;
diff --git a/src/txdb.cpp b/src/txdb.cpp
index 1622039d63..083cab3279 100644
--- a/src/txdb.cpp
+++ b/src/txdb.cpp
@@ -25,6 +25,9 @@ static constexpr uint8_t DB_HEAD_BLOCKS{'H'};
// Keys used in previous version that might still be found in the DB:
static constexpr uint8_t DB_COINS{'c'};
+// Threshold for warning when writing this many dirty cache entries to disk.
+static constexpr size_t WARN_FLUSH_COINS_COUNT{10'000'000};
+
bool CCoinsViewDB::NeedsUpgrade()
{
std::unique_ptr<CDBIterator> cursor{m_db->NewIterator()};
@@ -109,6 +112,8 @@ bool CCoinsViewDB::BatchWrite(CoinsViewCacheCursor& cursor, const uint256 &hashB
}
}
+ if (cursor.GetDirtyCount() > WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", cursor.GetDirtyCount());
+
// In the first batch, mark the database as being in the middle of a
// transition from old_tip to hashBlock.
// A vector is used for future extensibility, as we may want to support
diff --git a/src/validation.cpp b/src/validation.cpp
index 1f4f2e1ae8..3323a037dc 100644
--- a/src/validation.cpp
+++ b/src/validation.cpp
@@ -88,8 +88,6 @@ using node::CBlockIndexHeightOnlyComparator;
using node::CBlockIndexWorkComparator;
using node::SnapshotMetadata;
-/** Threshold for warning when writing this many dirty cache entries to disk. */
-static constexpr size_t WARN_FLUSH_COINS_COUNT = 10'000'000;
/** Time to wait between writing blocks/block index to disk. */
static constexpr std::chrono::hours DATABASE_WRITE_INTERVAL{1};
/** Time to wait between flushing chainstate to disk. */
@@ -2932,7 +2930,6 @@ bool Chainstate::FlushStateToDisk(
}
// Flush best chain related state. This can only be done if the blocks / block index write was also done.
if (fDoFullFlush && !CoinsTip().GetBestBlock().IsNull()) {
- if (coins_dirty_count >= WARN_FLUSH_COINS_COUNT) LogWarning("Flushing large (%d entries) UTXO set to disk, it may take several minutes", coins_dirty_count);
LOG_TIME_MILLIS_WITH_CATEGORY(strprintf("write coins cache to disk (%d out of %d cached coins)",
coins_dirty_count, coins_count), BCLog::BENCH);
Reproducer:
cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)
mkdir -p demo && rm -rfd demo/chainstate demo/chainstate_snapshot demo/debug.log
build/bin/bitcoind -datadir=demo -stopatheight=1
build/bin/bitcoind -datadir=demo -daemon -blocksonly=1 -connect=0 -dbcache=30000
build/bin/bitcoin-cli -datadir=demo -rpcclienttimeout=0 loadtxoutset ~/utxo-880000.dat
build/bin/bitcoin-cli -datadir=demo stop
Result:
2025-03-18T09:59:40Z [snapshot] 184000000 coins loaded (99.56%, 29332 MB)
2025-03-18T09:59:41Z [snapshot] loaded 184821030 (29456 MB) coins from snapshot 000000000000000000010b17283c3c400507969a9c2afd1dcf2082ec5cca2880
2025-03-18T09:59:41Z FlushSnapshotToDisk: saving snapshot chainstate (29456 MB) started
*2025-03-18T09:59:41Z [warning] Flushing large (184821030 entries) UTXO set to disk, it may take several minutes*
2025-03-18T10:00:50Z FlushSnapshotToDisk: completed (68776.27ms)
Would it be simpler if I pushed this change in a separate PR instead?
@@ -113,7 +115,10 @@ void CCoinsViewCache::AddCoin(const COutPoint &outpoint, Coin&& coin, bool possi | |||
void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) { | |||
cachedCoinsUsage += coin.DynamicMemoryUsage(); | |||
auto [it, inserted] = cacheCoins.try_emplace(std::move(outpoint), std::move(coin)); | |||
if (inserted) CCoinsCacheEntry::SetDirty(*it, m_sentinel); | |||
if (inserted) { | |||
CCoinsCacheEntry::SetDirty(*it, m_sentinel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following @andrewtoth's solution we can do:
void CCoinsViewCache::EmplaceCoinInternalDANGER(COutPoint&& outpoint, Coin&& coin) {
const auto mem_usage{coin.DynamicMemoryUsage()};
auto [it, inserted] = cacheCoins.try_emplace(std::move(outpoint), std::move(coin));
if (inserted) {
cachedCoinsUsage += mem_usage;
CCoinsCacheEntry::SetDirty(*it, m_sentinel);
++m_dirty_count;
}
}
src/coins.cpp
Outdated
@@ -343,6 +359,7 @@ void CCoinsViewCache::SanityCheck() const | |||
} | |||
assert(count_linked == count_flagged); | |||
assert(recomputed_usage == cachedCoinsUsage); | |||
assert(num_dirty == m_num_dirty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do this later to test reorg behavior in a functional test, you can resolve this comment.
@@ -123,6 +123,7 @@ FUZZ_TARGET(coins_view, .init = initialize_coins_view) | |||
CoinsCachePair sentinel{}; | |||
sentinel.second.SelfRef(sentinel); | |||
size_t usage{0}; | |||
size_t num_dirty{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover num_dirty
CoinsCachePair& sentinel LIFETIMEBOUND, | ||
CCoinsMap& map LIFETIMEBOUND, | ||
bool will_erase) noexcept | ||
: m_usage(usage), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {} | ||
: m_usage(usage), m_dirty(dirty), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind if we're mixing booleans and ints as long as the types and names are obvious, e.g.
size_t dirty = cache_coin ? cache_coin->IsDirty() : 0;
auto cursor{CoinsViewCacheCursor(usage, dirty, sentinel, map, /*will_erase=*/true)};
where we're setting a parameter called dirty
to the result of IsDirty
, but it gets converted to dirty count.
We could obviate this by calling it dirty_count
here as well:
CoinsViewCacheCursor(size_t& usage LIFETIMEBOUND,
size_t& dirty_count LIFETIMEBOUND,
CoinsCachePair& sentinel LIFETIMEBOUND,
CCoinsMap& map LIFETIMEBOUND,
bool will_erase) noexcept
: m_usage(usage), m_dirty_count(dirty_count), m_sentinel(sentinel), m_map(map), m_will_erase(will_erase) {}
++m_dirty_count; | ||
CCoinsCacheEntry::SetDirty(*itUs, m_sentinel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in other cases we set it to dirty first and only increment m_dirty_count
after (shouldn't matter on single thread but at least we don't have to think about whether that's the case or not):
++m_dirty_count; | |
CCoinsCacheEntry::SetDirty(*itUs, m_sentinel); | |
CCoinsCacheEntry::SetDirty(*itUs, m_sentinel); | |
++m_dirty_count; |
@l0rinc I've lost track a bit of your comments/suggestions. Is the PR as-is broken when interacting with assumeutxo? |
🐙 This pull request conflicts with the target branch and needs rebase. |
⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
@sipa, my comments are still relevant as far as I can tell, I think it's a good change, how can I help to advance it? |
Since #28280 and #28233, the percentage of dirty entries in the cache when a flush happen may in practice be far from 100%. The log output, the decision to warn about a large flush (#31534), and the disk-full detection, however always use the entire size of the cache.
Fix this by keeping track of the number of dirty entries in
CCoinsViewCache
, and using that number. I've dropped the usage ofDynamicMemoryUsage
as it's the wrong metric, even before the non-wiping flushes were introduced (if everything is dirty, memory usage is correlated with, but not the same as, the amount of disk space written or used). They could be improved by keeping track of proper write size estimates in a follow-up, but here I'm just using a fixed 48 bytes per entry estimate.