-
Notifications
You must be signed in to change notification settings - Fork 37.7k
assumeutxo: add init and completion logic #24232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there are test failures
8cdafd7
to
0952728
Compare
Thanks for the quick feedback Mco. The test failure was actually due to removing a commit that I'd forgotten the use for. Basically, because we're now calling MaybeRebalanceCaches in LoadChainstate, that prompts a FlushStateToDisk call before the block index is loaded, which trips some UB. The bounds check in d70b8de fixes this. |
52ba06d
to
dbee719
Compare
dbee719
to
7e2b212
Compare
7e2b212
to
86b252c
Compare
Okay, I've pushed a revision that addresses my previous comments and more comprehensively (that is to say: at all) tests the major failure modes. I had to parameterize the shutdown behavior during snapshot completion to get this to work within the unittests. Pending CI pass, this should be ready for continued review. |
1ac03ee
to
9f89221
Compare
Alright, CI is finally passing. I had to add 5d5dfaa because an existing unittest that advanced an IBD chain past the snapshot it generated started failing. This isn't something that should happen in practice, but was done to test an unrelated feature ( |
src/test/util/chainstate.h
Outdated
// removing. Without those headers, we can't activate the snapshot below. | ||
LOCK(::cs_main); | ||
uint256 gen_hash = node.chainman->ActiveChainstate().m_chain[0]->GetBlockHash(); | ||
node.chainman->Reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like #24299 was merged, so you might have to inline the needed resets here (or create a helper function for it).
9f89221
to
51d0dd3
Compare
Alright, have rebased; basically I just took the old |
d003388
to
84f7884
Compare
This is used in subsequent commits. It allows us to clean up UTXO snapshot chainstate after background validation completes.
This changes the snapshot's leveldb chainstate dir name from `chainstate_[blockhash]` to `chainstate_snapshot`. This simplifies later logic that loads snapshot data, and enforces the limitation of a single snapshot at any given time. Since we still need to persis the blockhash of the base block, we write that out to a file (`chainstate_snapshot/base_blockhash`) for later use during initialization, so that we can reinitialize the snapshot chainstate. Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
Add functionality for activating a snapshot-based chainstate if one is detected on-disk. Also cautiously initialize chainstate cache usages so that we don't somehow blow past our cache allowances during initialization, then rebalance at the end of init. Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
Used in later commits to remove leveldb directories for - invalid snapshot chainstates, and - background-vaildation chainstates that have finished serving their purpose.
If a UTXO snapshot fails to validate, don't leave the resulting datadir on disk as this will confuse initialization on next startup and we'll get an assertion error.
For use in later commits.
If we call FlushBlockFile() without having intitialized the block index with LoadBlockIndexDB(), we may be indexing into an empty vector. Specifically this is an issue when we call MaybeRebalanceCaches() during chainstate init before the block index has been loaded, which calls FlushBlockFile(). Also add an assert to avoid undefined behavior. Co-authored-by: Russell Yanofsky <russ@yanofsky.org>
and remove m_snapshot_validated. This state can now be inferred by the number of isUsable chainstates. m_stop_use is used to signal that a chainstate should no longer be used by validation logic; it is used as a sentinel when background validation completes or if the snapshot chainstate is found to be invalid. isUsable is a convenience method that incorporates m_stop_use.
Used in subsequent commits. Also cleans up asserts in coins_views-related convenience methods to be more exact.
Moves chainstate initialization into its own function. This is necessary to later support a more readable way of handling background-validation chainstate cleanup during init, since the chainstate initialization functions may need to be repeated after moving leveldb filesystem content around. This commit isn't strictly necessary, but the alternative is to (ab)use the `while` loop in init.cpp with a `continue` on the basis of a specific ChainstateLoadingError return value from LoadChainstate. Not only is this harder to read, but it can't be unittested. The approach here lets us consolidate background-validation cleanup to LoadChainstate, and therefore exercise it within tests. This commit is most easily reviewed with git diff --color-moved=dimmed-zebra --color-moved-ws=ignore-space-change
Trigger completion when a background validation chainstate reaches the same height as a UTXO snapshot, and handle cleaning up the chainstate on subsequent startup.
I found this useful during unittest debugging.
For use in next commit. Most easily reviewed with `--color-moved=dimmed_zebra --color-moved-ws=allow-indentation-change`.
This CreateAndActivateUTXOSnapshot parameter is necessary once we perform snapshot completion within ABC, since the existing UpdateTip test will fail because the IBD chain that has generated the snapshot will exceed the base of the snapshot. Being able to test snapshots being loaded into a mostly-uninitialized datadir allows for more realistic unittest scenarios.
Used when testing cleanup of on-disk chainstate data for snapshot testcases. Also necessary for simulating node restart in .cpp tests.
in TestingSetup(). This is used in the following commit to test reinitializing chainstates after snapshot validation and cleanup. Best reviewed with `git diff --color-moved=dimmed-zebra`.
Include notes about the `chainstate_snapshot` rename as well as updates for the included code.
84f7884
to
ac8c9d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased and addressed feedback. Thanks Russ and Antoine.
I'm approaching "thermonuclear levels of burnout" on this one, as the phrase goes. I'm getting bludgeoned by rebases, so any help reviewing this is appreciated.
I've looked at splitting this out into two PRs and while it is possible it will not be trivial for me to do. If any potential reviewers feel strongly about doing that, I can.
|
||
bool WriteSnapshotBaseBlockhash(CChainState& snapshot_chainstate) | ||
{ | ||
AssertLockHeld(::cs_main); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think conceptually this should hold cs_main
, since we're writing to a chainstate directory. Splitting cs_main out from per-chainstate locks should be a larger follow-on effort, and I think we can delay reworking this call until then.
// DestroyDB() (in DeleteCoinsDBFromDisk()) will fail. See `leveldb::~DBImpl()`. | ||
// Destructing the chainstate (and so resetting the coinsviews object) does this. | ||
snapshot_chainstate.reset(); | ||
bool removed = DeleteCoinsDBFromDisk(*snapshot_datadir, /*is_snapshot=*/true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find! I've added test coverage, and the tests now fail when this line is deleted.
Okay, I think it's pretty clear that the right move is to split this PR up into two separate ones (initialization vs. completion). I'll be opening those shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK ac8c9d8. Rebase and a lot of suggested cleanups since the last review, nothing major
I'm still a little confused about MaybeCompleteSnapshotValidation error handling, but in the worst case problems might just make debugging & error recovery less convenient so this is not a blocker
if (this != &m_chainman.ActiveChainstate()) { | ||
// This call may set `m_stop_use`, which is referenced immediately afterwards in | ||
// ActivateBestChain. | ||
m_chainman.MaybeCompleteSnapshotValidation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #24232 (comment)
In commit "validation: add ChainMan logic for completing UTXO snapshot validation" (1eb88f8)
Based upon my (re)reading of
MaybeCompleteSnapshotValidation()
, the snapshot cannot be found invalid (without some kind of code error) without callinghandle_invalid_snapshot()
, which will mark the snapshot chainstate as unusable and remove the associated coinsdb.
It seems like it would be possible to add some assertions about the return value or chain state here to verify that. But if you think that is overkill it would be good at least update the comment to be explicit that you are intentionally ignoring the return value. Could prefix "If this fails..." with "Intentionally ignoring return value. If this fails..."
If MaybeCompleteSnapshotValidation always cleans up after itself I guess I also don't understand why it is not an error if it fails here, but it is an error if it fails the other place it is called on startup.
If might be clearer if MaybeCompleteSnapshotValidation just returned an error code and callers would handle cleanup to make code clearer and easier to reason about.
uint256 snapshot_blockhash = *SnapshotBlockhash(); | ||
|
||
if (index_new.GetBlockHash() != snapshot_blockhash) { | ||
LogPrintf("[snapshot] supposed base block %s does not match the " /* Continued */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "validation: add ChainMan logic for completing UTXO snapshot validation" (1eb88f8)
This seems unclear what is supposed or why. Would seem clearer to say "Snapshot base block {snapshot_blockhash} height {snapshot_base_height} doesn't match validated block {index_new.hash} height {index_new.height}"
bf95976 doc: add note about snapshot chainstate init (James O'Beirne) e4d7995 test: add testcases for snapshot initialization (James O'Beirne) cced4e7 test: move-only-ish: factor out LoadVerifyActivateChainstate() (James O'Beirne) 51fc924 test: allow on-disk coins and block tree dbs in tests (James O'Beirne) 3c36139 test: add reset_chainstate parameter for snapshot unittests (James O'Beirne) 00b357c validation: add ResetChainstates() (James O'Beirne) 3a29dfb move-only: test: make snapshot chainstate setup reusable (James O'Beirne) 8153bd9 blockmanager: avoid undefined behavior during FlushBlockFile (James O'Beirne) ad67ff3 validation: remove snapshot datadirs upon validation failure (James O'Beirne) 34d1590 add utilities for deleting on-disk leveldb data (James O'Beirne) 252abd1 init: add utxo snapshot detection (James O'Beirne) f9f1735 validation: rename snapshot chainstate dir (James O'Beirne) d14bebf db: add StoragePath to CDBWrapper/CCoinsViewDB (James O'Beirne) Pull request description: This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11) (parent PR: bitcoin/bitcoin#15606) --- Half of the replacement for #24232. The original PR grew larger than expected throughout the review process. This change adds the ability to initialize a snapshot-based chainstate during init if one is detected on disk. This is of course unused as of now (aside from in unittests) given that we haven't yet enabled actually loading snapshots. Don't be scared! There are some big move-only commits in here. Accompanying changes include: - moving the snapshot coinsdb directory from being called `chainstate_[base blockhash]` to `chainstate_snapshot`, since we only support one snapshot in use at a time. This simplifies some logic, but it necessitates writing that base blockhash out to a file within the coinsdb dir. See [discussion here](bitcoin/bitcoin#24232 (comment)). - adding a simple fix in `FlushBlockFile()` that avoids a crash when attemping to flush to disk before `LoadBlockIndexDB()` is called, which happens when calling `MaybeRebalanceCaches()` during multiple chainstate init. - improving the unittest to allow testing with on-disk chainstates - necessary to test a simulated restart and re-initialization. ACKs for top commit: naumenkogs: utACK bf95976 ariard: Code Review ACK bf95976 ryanofsky: Code review ACK bf95976. Changes since last review: rebasing, switching from CAutoFile to AutoFile, adding comments, switching from BOOST_CHECK to Assert in test util, using chainman.GetMutex() in tests, destroying one ChainstateManager before creating a new one in tests fjahr: utACK bf95976 aureleoules: ACK bf95976 Tree-SHA512: 15ae75caf19f8d12a12d2647c52897904d27b265a7af6b4ae7b858592eeadb8f9da6c2394b6baebec90adc28742c053e3eb506119577dae7c1e722ebb3b7bcc0
…tion 2b373fe docs: update assumeutxo.md (James O'Beirne) 87a1108 test: add snapshot completion unittests (James O'Beirne) d70919a refactor: make MempoolMutex() public (James O'Beirne) 7300ced log: add LoadBlockIndex() message for assumedvalid blocks (James O'Beirne) d96c59c validation: add ChainMan logic for completing UTXO snapshot validation (James O'Beirne) f2a4f33 move-only-ish: init: factor out chainstate initialization (James O'Beirne) 637a90b add Chainstate::HasCoinsViews() (James O'Beirne) c29f26b validation: add CChainState::m_disabled and ChainMan::isUsable (James O'Beirne) 5ee22cd add ChainstateManager.GetSnapshot{BaseHeight,BaseBlock}() (James O'Beirne) Pull request description: This is part of the [assumeutxo project](https://github.com/bitcoin/bitcoin/projects/11) (parent PR: bitcoin/bitcoin#15606) Part two of replacing bitcoin/bitcoin#24232. --- When a user activates a snapshot, the serialized UTXO set data is used to create an "assumed-valid" chainstate, which becomes active in an attempt to get the node to network tip as quickly as possible. Simultaneously in the background, the already-existing chainstate continues "conventional" IBD to both accumulate full block data and serve as a belt-and-suspenders to validate the assumed-valid chainstate. Once the background chainstate's tip reaches the base block of the snapshot used, we set `m_stop_use` on that chainstate and immediately take the hash of its UTXO set; we verify that this matches the assumeutxo value in the source code. Note that while we ultimately want to remove this background chainstate, we don't do so until the following initialization process, when we again check the UTXO set hash of the background chainstate, and if it continues to match, we remove the (now unnecessary) background chainstate, and move the (previously) assumed-valid chainstate into its place. We then reinitialize the chainstate in the normal way. As noted in previous comments, we could do the filesystem operations "inline" immediately when the background validation completes, but that's basically just an optimization that saves disk space until the next restart. It didn't strike me as worth the risk of moving chainstate data around on disk during runtime of the node, though maybe my concerns are overblown. The final result of this completion process is a fully-validated chain, where the only evidence that the user synced using assumeutxo is the existence of a `base_blockhash` file in the `chainstate` directory. ACKs for top commit: achow101: ACK 2b373fe Tree-SHA512: a204e1d6e6932dd83c799af3606b01a9faf893f04e9ee1a36d63f2f1ccfa9118bdc1c107d86976aa0312814267e6a42074bf3e2bf1dead4b2513efc6d955e13d
This is part of the assumeutxo project (parent PR: #15606)
This changeset adds logic for detecting and activating snapshot chainstates on start, and for completing the snapshot validation process once the background chainstate reaches the base block of the assumed-valid chain. It also handles removing the snapshot chainstate data on-disk in the event that the snapshot fails to validate.
As detailed in assumeutxo.md:
Most of this change is unittested, though some logic (anything assuming on-disk leveldb data) can't be tested without some rework of unittest setup, given in-memory leveldb instances are used. I can look into how difficult this would be.This change adds the ability to unittest using on-disk leveldb coins dbs.
Possible follow-ups
CDBWrapper::{m_path,m_is_memory}
: assumeutxo: add init and completion logic #24232 (comment)