Skip to content

Conversation

jamesob
Copy link
Contributor

@jamesob jamesob commented Jul 29, 2022

This is part of the assumeutxo project (parent PR: #15606)

Part two of replacing #24232.


When a user activates a snapshot, the serialized UTXO set data is used to create an "assumed-valid" chainstate, which becomes active in an attempt to get the node to network tip as quickly as possible. Simultaneously in the background, the already-existing chainstate continues "conventional" IBD to both accumulate full block data and serve as a belt-and-suspenders to validate the assumed-valid chainstate.

Once the background chainstate's tip reaches the base block of the snapshot used, we set m_stop_use on that chainstate and immediately take the hash of its UTXO set; we verify that this matches the assumeutxo value in the source code. Note that while we ultimately want to remove this background chainstate, we don't do so until the following initialization process, when we again check the UTXO set hash of the background chainstate, and if it continues to match, we remove the (now unnecessary) background chainstate, and move the (previously) assumed-valid chainstate into its place. We then reinitialize the chainstate in the normal way.

As noted in previous comments, we could do the filesystem operations "inline" immediately when the background validation completes, but that's basically just an optimization that saves disk space until the next restart. It didn't strike me as worth the risk of moving chainstate data around on disk during runtime of the node, though maybe my concerns are overblown.

The final result of this completion process is a fully-validated chain, where the only evidence that the user synced using assumeutxo is the existence of a base_blockhash file in the chainstate directory.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 29, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK achow101
Stale ACK ryanofsky, ariard, brunoerg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #26857 (OP_VAULT draft by jamesob)
  • #25977 (refactor: Replace std::optional<bilingual_str> with util::Result by ryanofsky)
  • #25722 (refactor: Use util::Result class for wallet loading by ryanofsky)
  • #25665 (refactor: Add util::Result failure values, multiple error and warning messages by ryanofsky)
  • #24008 (assumeutxo: net_processing changes by jamesob)
  • #15606 (assumeutxo by jamesob)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jonatack
Copy link
Member

test/validation_chainstatemanager_tests.cpp:418:35: error: calling function 'DisconnectTip' requires holding mutex 'bg_chainstate.m_mempool->cs' exclusively [-Werror,-Wthread-safety-precise]
        BOOST_CHECK(bg_chainstate.DisconnectTip(unused_state, &unused_pool));
                                  ^
test/validation_chainstatemanager_tests.cpp:418:35: note: found near match 'mempool->cs'
1 error generated.

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK d4fc4ae if missing chainstatemanager_loadblockindex test is added back. First 12 commits were previously reviewed as part of #25667. Later commits were cherry-picked from #24232 without code changes

//! chainstate only contains fully validated blocks and the other chainstate contains all blocks,
//! even those assumed-valid.
//!
BOOST_FIXTURE_TEST_CASE(chainstatemanager_loadblockindex, TestChain100Setup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit "test: add testcases for snapshot initialization" (5ff0d2f)

Something wonky is going on in this commit. It seems like some rebase error. This is dropping the chainstatemanager_loadblockindex test, and replacing it with an exact copy of the chainstatemanager_snapshot_init below, which now appears twice (and is edited in later commit 65ad086 to become a different chainstatemanager_snapshot_completion test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanofsky yeah, thanks - not sure what happened there. Should be fixed.

@jamesob
Copy link
Contributor Author

jamesob commented Sep 6, 2022

Clang failure on some platforms looks spurious:

test/validation_chainstatemanager_tests.cpp:501:35: error: calling function 'DisconnectTip' requires holding mutex 'bg_chainstate.m_mempool->cs' exclusively [-Werror,-Wthread-safety-precise]
        BOOST_CHECK(bg_chainstate.DisconnectTip(unused_state, &unused_pool));
                                  ^
test/validation_chainstatemanager_tests.cpp:501:35: note: found near match 'mempool->cs'
1 error generated.

but lock is taken here.

Anyone have ideas for workarounds?

@ryanofsky
Copy link
Contributor

ryanofsky commented Sep 6, 2022

If the lock is for bg_chainstate.GetMempool()->cs but the annotation requires bg_chainstate.m_mempool->cs, the compiler probably isn't going to know that GetMempool() and m_mempool are the same. Probably should stick with m_mempool

@jamesob
Copy link
Contributor Author

jamesob commented Sep 6, 2022

Probably should stick with m_mempool

It's protected 😬. But looks like I can use the RETURNS_LOCK annotation on MempoolMutex().

Edit: that's private too, hah.

@jonatack
Copy link
Member

In the pull description, could drop or update "Since it's dependent on the commits in #25667, I'm opening this as a draft."

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a00d0e7.

Since last ACK ab41e8b:

  • move nPruneTarget usage to BlockManager::GetPruneTarget() in LoadChainstate()
  • bug report call message (i.e PACKAGE_BUGREPORT) in LogPrintf in ActivateBestChain()
  • move GetUTXOStats mention to ComputeUTXOStats in MaybeCompleteSnapshotValidation comment
  • new BOOST_CHECKs in chainstatemanager_snapshot_completion about cache size bytes

@DrahtBot DrahtBot mentioned this pull request Feb 23, 2023
@Sjors
Copy link
Member

Sjors commented Feb 23, 2023

I'd like to test this with loadtxoutset. I wonder if I can just cherry-pick a few things from #15606 to do that myself, or if a more rigorous rebase is needed...

@jamesob
Copy link
Contributor Author

jamesob commented Feb 28, 2023

Maintainers: what else would you like to see for a merge here?

@achow101
Copy link
Member

Maintainers: what else would you like to see for a merge here?

Waiting for a couple more reviews.

@jamesob
Copy link
Contributor Author

jamesob commented Mar 2, 2023

In the pull description, could drop or update "Since it's dependent on the commits in #25667, I'm opening this as a draft."

Done, thanks.

@brunoerg
Copy link
Contributor

brunoerg commented Mar 3, 2023

The commit message of 287bb3f "validation: add CChainState::m_disabled and ChainMan::isUsable" still mentions m_stop_use rather than m_disabled

In the pull description, it still mention m_stop_use.

Copy link
Contributor

@brunoerg brunoerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crACK a00d0e7

In 5ee22cd, it creates ChainstateManager::GetSnapshotBaseBlock and ChainstateManager::GetSnapshotBaseHeight. GetSnapshotBaseHeight is used (not in this commit) in MaybeCompleteSnapshotValidation to check if IBD (in background) has been completed.

In c29f26b, it creates CChainState::m_disabled and I could check it's been used to point that a chainstate should no longer be used by validation logic. (perhaps PR description should be updated).

Some logic I could understand:
assumeutxo2

In e91b216, it first added in CompleteChainstateInitialization

assert(chainman.m_total_coinstip_cache > 0);
assert(chainman.m_total_coinsdb_cache > 0);

Good to check we have enough bytes available for in-memory coins caches and leveldb stuff.

In LoadChainstate I could check (what is mentioned):

  1. Loads the fully validated chainstate
  2. Loads a chain created from a UTXO snapshot, if any exist.

With both ones, it calls MaybeCompleteSnapshotValidation to compare the background chainstate's UTXO hash with the hard-coded (and expected) one. If so, it sets m_disabled=true. And then, we can _re-_initialize the chainstate (InitializeChainstate).

If we could have loadtxoutset (as mentioned by @Sjors), it would be good to review it and see things happen in practice.

@achow101
Copy link
Member

achow101 commented Mar 7, 2023

Missed this on last review (sorry):

../../../src/validation.cpp: In member function ‘SnapshotCompletionResult ChainstateManager::MaybeCompleteSnapshotValidation(std::function<void(bilingual_str)>)’:
../../../src/validation.cpp:5450:14: error: catching polymorphic type ‘struct StopHashingException’ by value [-Werror=catch-value=]
 5450 |     } catch (StopHashingException) {
      |              ^~~~~~~~~~~~~~~~~~~~

jamesob added 5 commits March 7, 2023 16:06
Trigger completion when a background validation chainstate reaches the
same height as a UTXO snapshot, and handle cleaning up the chainstate
on subsequent startup.
I found this useful during unittest debugging.
for use in the following unittests.
Also adjusts the previous snapshot chainstate init tests
to account for the fact that the init process is now attempting to
validate and complete background chainstates whose tip is at the
snapshot base block. We use a DisconnectTip() hack to preserve the
nature of the test.
Include notes about the `chainstate_snapshot` rename as well as
updates for the included code.
@jamesob jamesob force-pushed the 2022-07-au.complete branch from a00d0e7 to 2b373fe Compare March 7, 2023 21:07
@jamesob
Copy link
Contributor Author

jamesob commented Mar 7, 2023

Missed this on last review (sorry):

Fixed!

@achow101
Copy link
Member

achow101 commented Mar 7, 2023

ACK 2b373fe

@achow101 achow101 merged commit d5e4f9a into bitcoin:master Mar 7, 2023
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Mar 8, 2023
return SnapshotCompletionResult::STATS_FAILED;
}

// XXX note that this function is slow and will hold cs_main for potentially minutes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just catching up... Not sure if intentional or not but maybe the XXX here was something you still wanted to address but missed @jamesob ?

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some nits. Feel free to ignore

fs::path p_new,
const fs::filesystem_error& err) {
LogPrintf("%s: error renaming file (%s): %s\n",
__func__, fs::PathToString(p_old), err.what());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit in d96c59c: No need to pass func

  • This is already done by log internally
  • This will print a useless operator(), twice: [validation.cpp:5712] [operator()] operator(): error renaming file

fs::rename(ibd_chainstate_path, tmp_old);
} catch (const fs::filesystem_error& e) {
rename_failed_abort(ibd_chainstate_path, tmp_old, e);
throw;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit in d96c59c:

Looks like the LOC can be reduced by moving all of this into the lambda instead of repeating the passed paths twice and the throw logic twice.

auto dest_str = fs::PathToString(invalid_path);

LogPrintf("%s: error renaming file '%s' -> '%s': %s\n",
__func__, src_str, dest_str, e.what());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Same about __func__

"Please report this incident to %s, including how you obtained the snapshot. "
"The invalid snapshot chainstate has been left on disk in case it is "
"helpful in diagnosing the issue that caused this error."),
PACKAGE_NAME, snapshot_tip_height, snapshot_base_height, snapshot_base_height, PACKAGE_BUGREPORT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same commit: At the risk of people running into this, I wonder if this should use STR_INTERNAL_BUG() so that more info is printed about the version/commit hash they were using.

// chainstate past the snapshot base block.
if (WITH_LOCK(::cs_main, return m_disabled)) {
LogPrintf("m_disabled is set - this chainstate should not be in operation. " /* Continued */
"Please report this as a bug. %s\n", PACKAGE_BUGREPORT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. (STR_INTERNAL_BUG(msg))

Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
Summary:
This is a backport of [[bitcoin/bitcoin#23154 | core#23154]]
with updates from [[bitcoin/bitcoin#25667 | core#25667]] and [[bitcoin/bitcoin#25740 | core#25740]]

Depends on D14655

Test Plan: proof-reading

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14656
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
Summary:
and remove m_snapshot_validated. This state can now be inferred by the
number of isUsable chainstates.

m_disabled is used to signal that a chainstate should no longer be used
by validation logic; it is used as a sentinel when background validation
completes or if the snapshot chainstate is found to be invalid.

isUsable is a convenience method that incorporates m_disabled.

This is a partial backport of [[bitcoin/bitcoin#25740 | core#25740]]
bitcoin/bitcoin@c29f26b

Depends on D14673

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14685
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
Summary:
Moves chainstate initialization into its own function. This is
necessary to later support a more readable way of handling
background-validation chainstate cleanup during init, since the
chainstate initialization functions may need to be repeated after
moving leveldb filesystem content around.

This commit isn't strictly necessary, but the alternative is to (ab)use
the `while` loop in init.cpp with a `continue` on the basis of a
specific ChainstateLoadingError return value from LoadChainstate. Not
only is this harder to read, but it can't be unittested.

The approach here lets us consolidate background-validation cleanup to
LoadChainstate, and therefore exercise it within tests.

This commit is most easily reviewed with

  git diff --color-moved=dimmed-zebra
  --color-moved-ws=ignore-space-change

This is a partial backport of [[bitcoin/bitcoin#25740 | core#25740]]
bitcoin/bitcoin@f2a4f33

Depends on D14685

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14686
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Oct 26, 2023
Summary:
> add ChainstateManager.GetSnapshot{BaseHeight,BaseBlock}()
>
> For use in later commits.

bitcoin/bitcoin@5ee22cd

----

> add Chainstate::HasCoinsViews()
>
> Used in subsequent commits. Also cleans up asserts in
> coins_views-related convenience methods to be more exact.

bitcoin/bitcoin@637a90b

----

> validation: add ChainMan logic for completing UTXO snapshot validation
>
> Trigger completion when a background validation chainstate reaches the
> same height as a UTXO snapshot, and handle cleaning up the chainstate
> on subsequent startup.
bitcoin/bitcoin@d96c59c

----

> log: add LoadBlockIndex() message for assumedvalid blocks
>
> I found this useful during unittest debugging.
bitcoin/bitcoin@7300ced

----

> refactor: make MempoolMutex() public
>
> for use in the following unittests.

bitcoin/bitcoin@d70919a

----

> test: add snapshot completion unittests
>
> Also adjusts the previous snapshot chainstate init tests
> to account for the fact that the init process is now attempting to
> validate and complete background chainstates whose tip is at the
> snapshot base block. We use a DisconnectTip() hack to preserve the
> nature of the test.

bitcoin/bitcoin@87a1108

----

This completes backport of [[bitcoin/bitcoin#25740 | core#25740]]

Depends on D14686

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D14687
@bitcoin bitcoin locked and limited conversation to collaborators Mar 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.