-
Notifications
You must be signed in to change notification settings - Fork 37.7k
validation: periodically flush dbcache during reindex-chainstate #32414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32414. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsNo conflicts as of last run. |
Thanks to @l0rinc for discovering this behaviour, and @mzumsande for the suggested fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm verifying is this fixes things or not, whether it slows down anything - or if it changes caching behavior in any other way.
Regardless, I think we should cover this with a test - do you think we could extend e.g. https://github.com/bitcoin/bitcoin/blob/master/test/functional/feature_reindex.py with an assert_debug_log
checking the periodic write logs of #32404?
I can confirm that this fixes the periodic flushes for Details
Nit: is this comment still accurate? https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L2782 Guess it just refers to the size based trigger, not the time based - which is probably fine given its small context. |
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
0f089bd
to
da4090e
Compare
I think it's still accurate, and #30611 doesn't make it stale. I think the comment https://github.com/bitcoin/bitcoin/blob/master/src/validation.cpp#L2861 describes the behavior.
Hmm I was wondering if we could do something with mocktime and validation interface to confirm it flushes. |
Move the periodic flush inside the outer loop of ActivateBestChain. For very long activations, such as with reindex-chainstate, this calls periodic flushes so progress can be saved to disk. Co-Authored-By: l0rinc <pap.lorinc@gmail.com>
e4fbf33
to
8d1e9fe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Thanks for fixing it!
I need a bit mor explanation to be able to fully ack
The main lock needs to be taken 3 separate times. It simplifies readability to take it once, and is more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK - I'm still running the reindex / IBD benchmarks to see if this changed anything unexpected.
Do you think I should stop and restart after the locking changes?
nits
diff --git a/src/test/chainstate_write_tests.cpp b/src/test/chainstate_write_tests.cpp
index e1b82ebc12..44257d5591 100644
--- a/src/test/chainstate_write_tests.cpp
+++ b/src/test/chainstate_write_tests.cpp
@@ -16,7 +16,8 @@ BOOST_AUTO_TEST_SUITE(chainstate_write_tests)
BOOST_FIXTURE_TEST_CASE(chainstate_write_interval, TestingSetup)
{
- struct TestSubscriber final : CValidationInterface {
+ struct TestSubscriber final : CValidationInterface
+ {
bool m_did_flush{false};
void ChainStateFlushed(ChainstateRole, const CBlockLocator&) override
{
@@ -34,13 +35,12 @@ BOOST_FIXTURE_TEST_CASE(chainstate_write_interval, TestingSetup)
m_node.validation_signals->SyncWithValidationInterfaceQueue();
BOOST_CHECK(!sub->m_did_flush);
- // The periodic flush interval is between 50 and 70 minutes (inclusive)
- SetMockTime(GetTime<std::chrono::minutes>() + DATABASE_WRITE_INTERVAL_MIN - 1min);
+ SetMockTime(GetMockTime() + DATABASE_WRITE_INTERVAL_MIN - 1min);
chainstate.FlushStateToDisk(state_dummy, FlushStateMode::PERIODIC);
m_node.validation_signals->SyncWithValidationInterfaceQueue();
BOOST_CHECK(!sub->m_did_flush);
- SetMockTime(GetTime<std::chrono::minutes>() + DATABASE_WRITE_INTERVAL_MAX);
+ SetMockTime(GetMockTime() + DATABASE_WRITE_INTERVAL_MAX);
chainstate.FlushStateToDisk(state_dummy, FlushStateMode::PERIODIC);
m_node.validation_signals->SyncWithValidationInterfaceQueue();
BOOST_CHECK(sub->m_did_flush);
@@ -59,7 +59,8 @@ BOOST_FIXTURE_TEST_CASE(write_during_multiblock_activation, TestChain100Setup)
{
m_flushed_at_block = m_tip;
}
- void UpdatedBlockTip(const CBlockIndex* block_index, const CBlockIndex*, bool) override {
+ void UpdatedBlockTip(const CBlockIndex* block_index, const CBlockIndex*, bool) override
+ {
m_tip = block_index;
}
};
@@ -82,7 +83,6 @@ BOOST_FIXTURE_TEST_CASE(write_during_multiblock_activation, TestChain100Setup)
// Set m_next_write to current time
chainstate.FlushStateToDisk(state_dummy, FlushStateMode::ALWAYS);
m_node.validation_signals->SyncWithValidationInterfaceQueue();
- // The periodic flush interval is between 50 and 70 minutes (inclusive)
// The next call to a PERIODIC write will flush
SetMockTime(GetMockTime() + DATABASE_WRITE_INTERVAL_MAX);
// before reaching chain tip. | ||
BOOST_FIXTURE_TEST_CASE(write_during_multiblock_activation, TestChain100Setup) | ||
{ | ||
struct TestSubscriber final : CValidationInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if you edit again please reformat TestSubscriber
and UpdatedBlockTip
as well
@@ -31,15 +35,68 @@ BOOST_FIXTURE_TEST_CASE(chainstate_write_interval, TestingSetup) | |||
BOOST_CHECK(!sub->m_did_flush); | |||
|
|||
// The periodic flush interval is between 50 and 70 minutes (inclusive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now that we have the constants these comments don't really add anything
// Set m_next_write to current time | ||
chainstate.FlushStateToDisk(state_dummy, FlushStateMode::ALWAYS); | ||
m_node.validation_signals->SyncWithValidationInterfaceQueue(); | ||
// The periodic flush interval is between 50 and 70 minutes (inclusive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, can be removed now I think (and 70 minutes isn't inclusive anymore, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Finished measuring Measured it on an HDD with i7 processor, so disk writes are expensive here - would likely be better on SSD. BenchmarkCOMMITS="eba5f9c4b63fe46261fbb3e71b9a94832d105b23 b56a7e2e983c78eca9acdf12a77f60a3cc6b53a0"; \
STOP_HEIGHT=888888; DBCACHE=45000; \
CC=gcc; CXX=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
(echo ""; for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done; echo "") && \
hyperfine \
--sort 'command' \
--runs 2 \
--export-json "$BASE_DIR/rdx-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
--parameter-list COMMIT ${COMMITS// /,} \
--prepare "killall bitcoind; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \
./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=5000 -printtoconsole=0; sleep 100" \
--cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
"COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=$DBCACHE"
Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=45000 (COMMIT = eba5f9c4b63fe46261fbb3e71b9a94832d105b23)
Time (mean ± σ): 22404.449 s ± 77.811 s [User: 20909.449 s, System: 563.985 s]
Range (min … max): 22349.428 s … 22459.469 s 2 runs
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=45000 (COMMIT = b56a7e2e983c78eca9acdf12a77f60a3cc6b53a0)
Time (mean ± σ): 24821.312 s ± 19.392 s [User: 22218.160 s, System: 746.676 s]
Range (min … max): 24807.600 s … 24835.024 s 2 runs Relative speed comparison
1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=45000 (COMMIT = eba5f9c4b63fe46261fbb3e71b9a94832d105b23)
1.11 ± 0.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=45000 (COMMIT = b56a7e2e983c78eca9acdf12a77f60a3cc6b53a0) It's unfortunate to lose this much performance, as suggested before, maybe the change from 24 hours to 1 hour was too big - what if we tried an average of 100 minutes instead? Will measure IBD with latest changes now. |
I'm not sure it's worth bike shedding over the amount of time to periodically flush. The difference with an HDD is 6 hours 13 minutes vs 6 hours 53 minutes. So we could be saving a maximum of 40 minutes over something that takes over 6 hours anyways. |
IBD seems to be basically the same as before, this change indeed only affects DetailsCOMMITS="baa848b8d38187ce6b24a57cfadf1ea180209711 c822dd1f355b6f403191072cdd3c9c3e234bbcaa"; \
STOP_HEIGHT=888888; DBCACHE=45000; \
CC=gcc; CXX=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
(echo ""; for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done; echo "") && \
hyperfine \
--sort 'command' \
--runs 1 \
--export-json "$BASE_DIR/ibd-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
--parameter-list COMMIT ${COMMITS// /,} \
--prepare "killall bitcoind; rm -rf $DATA_DIR/*; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \
./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=1 -printtoconsole=0; sleep 10" \
--cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
"COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -blocksonly -printtoconsole=0 -stopatheight=$STOP_HEIGHT -dbcache=$DBCACHE"
Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -blocksonly -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = baa848b8d38187ce6b24a57cfadf1ea180209711)
Time (abs ≡): 38323.520 s [User: 36485.235 s, System: 2391.492 s]
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -blocksonly -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = c822dd1f355b6f403191072cdd3c9c3e234bbcaa)
Time (abs ≡): 38063.544 s [User: 35628.341 s, System: 1952.398 s] Relative speed comparison
1.01 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -blocksonly -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = baa848b8d38187ce6b24a57cfadf1ea180209711)
1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -blocksonly -printtoconsole=0 -stopatheight=888888 -dbcache=45000 (COMMIT = c822dd1f355b6f403191072cdd3c9c3e234bbcaa)
Yes, it's a significant difference. I'll retry it with 100 minutes to see if it would fix the situation. |
With 100 mins it's 7% slower instead of 11% - still quite bad, but we shouldn't have to choose between performance and persistence, so let's find ways to reduce that. And IBD is the usecase that matters most anyway. DetailsCOMMITS="59d3e4ed34eb55cb40928d524cb0bd5e183ed85a cf75a7b599fe459935b7aa4776e91526fcfb0fcc"; \
STOP_HEIGHT=888888; DBCACHE=45000; \
CC=gcc; CXX=g++; \
BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
(echo ""; for c in $COMMITS; do git fetch origin $c -q && git log -1 --pretty=format:'%h %s' $c || exit 1; done; echo "") && \
hyperfine \
--sort 'command' \
--runs 2 \
--export-json "$BASE_DIR/rdx-${COMMITS// /-}-$STOP_HEIGHT-$DBCACHE-$CC.json" \
--parameter-list COMMIT ${COMMITS// /,} \
--prepare "killall bitcoind; rm -f $DATA_DIR/debug.log; git checkout {COMMIT}; git clean -fxd; git reset --hard; \
cmake -B build -DCMAKE_BUILD_TYPE=Release -DENABLE_WALLET=OFF && cmake --build build -j$(nproc) --target bitcoind && \
./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -dbcache=5000 -printtoconsole=0; sleep 10" \
--cleanup "cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
"COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP_HEIGHT -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=$DBCACHE"
Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=45000 (COMMIT = 59d3e4ed34eb55cb40928d524cb0bd5e183ed85a)
Time (mean ± σ): 22098.816 s ± 5.810 s [User: 20814.326 s, System: 557.473 s]
Range (min … max): 22094.707 s … 22102.924 s 2 runs
Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=45000 (COMMIT = cf75a7b599fe459935b7aa4776e91526fcfb0fcc)
Time (mean ± σ): 23737.633 s ± 159.720 s [User: 21704.678 s, System: 682.595 s]
Range (min … max): 23624.694 s … 23850.572 s 2 runs Relative speed comparison
1.00 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=45000 (COMMIT = 59d3e4ed34eb55cb40928d524cb0bd5e183ed85a)
1.07 ± 0.01 COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=888888 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -dbcache=45000 (COMMIT = cf75a7b599fe459935b7aa4776e91526fcfb0fcc) ACK c1e554d Nit: I'd squash the first two commits, they should be split by features, not by files. |
Move the periodic flush inside the outer loop of ActivateBestChain. For very long activations, such as with reindex-chainstate, this calls periodic flushes so progress can be saved to disk. Co-Authored-By: l0rinc <pap.lorinc@gmail.com> Github-Pull: bitcoin#32414 Rebased-From: 8482056
Github-Pull: bitcoin#32414 Rebased-From: 41479ed
After #30611 we periodically do a non-erasing flush of the dbcache to disk roughly every hour during IBD.
The intention was to also do this periodic flush during reindex-chainstate, so we would not risk losing progress during a system failure when reindexing with a high dbcache value.
It was discovered that reindex-chainstate does not perform a PERIODIC flush until it has already reached the tip. Since reindexing to tip usually happens within 24 hours, this behaviour was unnoticed with the previous periodic flush interval. Note that reindex-chainstate still does IF_NEEDED flushes during
ConnectBlock
, so this also would not be noticed when running with a lower dbcache value.This patch moves the PERIODIC flush from after the outer loop in
ActivateBestChain
to inside the outer loop after we releasecs_main
. This will periodically flush during IBD, reindex-chainstate, and steady state.