-
Notifications
You must be signed in to change notification settings - Fork 37.8k
refactor: Improve assumeutxo state representation #30214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30214. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. LLM Linter (✨ experimental)Possible typos and grammar issues:
drahtbot_id_4_m |
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews! I think this PR is in a pretty good state but I need to complete the last commit to make the terminology more consistent.
Rebased 12f276b -> a0672af (pr/cstate.15
-> pr/cstate.16
, compare) due to various conflicts, implementing various suggestions.
Updated a0672af -> f6bb080 (pr/cstate.16
-> pr/cstate.17
, compare) fixing clang-tidy error and renaming some variables for more consistent naming
@@ -1106,22 +1129,19 @@ class ChainstateManager | |||
//! Returns nullptr if no snapshot has been loaded. | |||
const CBlockIndex* GetSnapshotBaseBlock() const EXCLUSIVE_LOCKS_REQUIRED(::cs_main); | |||
|
|||
//! Return historical chainstate targeting a specific block, if any. | |||
Chainstate* HistoricalChainstate() const EXCLUSIVE_LOCKS_REQUIRED(GetMutex()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #30214 (comment)
Are we then commited to the 'historical' naming now with this and I can remove all the remaining ibd/background references in a follow-up? Because that naming inconsistency has always annoyed me and now we have three names with this. I think it's very confusing for newcomers as well.
I think so and the last commit "doc: Improve ChainstateManager documentation, use consistent terms" was started to do this and I should just complete it.
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
Change ChainstateRole parameter passed to wallets and indexes. Wallets and indexes need to know whether chainstate is historical and whether it is fully validated. They should not be aware of the assumeutxo snapshot validation process.
IsSnapshotActive() method is only called one place outside of tests and asserts, and is confusing because it returns true even after the snapshot is fully validated. The documentation which said this "implies that a background validation chainstate is also in use" is also incorrect, because after the snapshot is validated, the background chainstate gets disabled and IsUsable() would return false.
IsSnapshotValidated() is only called one place outside of tests, and is use redundantly in some tests, asserting that a snapshot is not validated when a snapshot chainstate does not even exist. Simplify by dropping the method and checking Chainstate m_assumeutxo field directly.
SnapshotBlockhash() is only called two places outside of tests, and is used redundantly in some tests, checking the same field as other checks. Simplify by dropping the method and using the m_from_snapshot_blockhash field directly.
Use to replace m_active_chainstate, m_ibd_chainstate, and m_snapshot_chainstate members. This has several benefits: - Ensures ChainstateManager treats chainstates instances equally, making distinctions based on their attributes, not having special cases and making assumptions based on their identities. - Normalizes ChainstateManager representation so states that should be impossible to reach and validation code has no handling for (like m_snapshot_chainstate being set and m_ibd_chainstate being unset, or both being set but m_active_chainstate pointing to the m_ibd_chainstate) can no longer be represented. - Makes ChainstateManager more extensible so new chainstates can be added for different purposes, like indexing or generating and validating assumeutxo snapshots without interrupting regular node operations. With the m_chainstates member, new chainstates can be added and handled without needing to make changes all over validation code or to copy/paste/modify the existing code that's been already been written to handle m_ibd_chainstate and m_snapshot_chainstate. - Avoids terms that are confusing and misleading: - The term "active chainstate" term is confusing because multiple chainstates will be active and in use at the same time. Before a snapshot is validated, wallet code will use the snapshot chainstate, while indexes will use the IBD chainstate, and netorking code will use both chainstates, downloading snapshot blocks at higher priority, but also IBD blocks simultaneously. - The term "snapshot chainstate" is ambiguous because it could refer either to the chainstate originally loaded from a snapshot, or to the chainstate being used to validate a snapshot that was loaded, or to a chainstate being used to produce a snapshot, but it is arbitrary used to refer the first thing. The terms "most-work chainstate" or "assumed-valid chainstate" should be less ambiguous ways to refer to chainstates loaded from snapshots. - The term "IBD chainstate" is not just ambiguous because actively confusing because technically IBD ends and the node is considered synced when the snapshot chainstate finishes syncing, so in practice the IBD chainstate will mostly by synced after IBD is complete. The term "fully-validated" is a better way of describing the characteristics and purpose of this chainstate.
Deduplicate code looping over chainstate objects and calling ActivateBestChain() and avoid need for code outside ChainstateManager to use the GetAll() method.
Just use m_chainstates array instead.
Move GetPruneRange from ChainstateManager to Chainstate.
Updated f6bb080 -> e6abe3a ( |
if (m_from_snapshot_blockhash && m_assumeutxo != Assumeutxo::VALIDATED) { | ||
// Only prune blocks _after_ the snapshot if this is a snapshot chain | ||
// that has not been fully validated yet. The earlier blocks need to be | ||
// kept to validate the snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been thinking about this again, but getting less sure about the mechanics at play here. What is the scenario where the historical chainstate needs the blocks to validate the snapshot chainstate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: #30214 (comment)
In commit "refactor: Add Chainstate m_assumeutxo and m_target_utxohash members" (a416c6d)
Been thinking about this again, but getting less sure about the mechanics at play here. What is the scenario where the historical chainstate needs the blocks to validate the snapshot chainstate?
This is confusing and it makes me realize the surrounding documentation here is not good. For example the GetPruneRange
documentation doesn't even mention the chainstate
argument.
But the idea is that GetPruneRange
is called via FindFilesToPrune
via FlushStateToDisk
for just one chainstate at a time. And the function is supposed to return a range of blocks to prune for just that chainstate. Each chainstate basically owns a region of the chain and should only prune its own blocks, not other blocks.
The historical chainstate is responsible for pruning blocks between genesis and the snapshot block (i.e. its target block) and the current chainstate is responsible for pruning blocks after the snapshot block it was created from. Neither chainstate should be pruning blocks it did not validate. In practice, if this check was not here it would probably cause historical blocks to be pruned before they could be validated.
This whole pruning model doesn't generalize and is probably unnecessarily confusing. It would probably be better if each chainstate returned the range of blocks it didn't want pruned and then the blockmanager was given all of those ranges and figured out which block to prune based on the ranges. A change like that could make sense as a followup.
🐙 This pull request conflicts with the target branch and needs rebase. |
This PR contains the first part of #28608, which tries to make assumeutxo code more maintainable, and improve it by not locking
cs_main
for a long time when the snapshot block is connected, and by deleting the snapshot validation chainstate when it is no longer used, instead of waiting until the next restart.The changes in this PR are just refactoring. They make
Chainstate
objects self-contained, so for example, it is possible to determine what blocks to connect to a chainstate without queryingChainstateManager
, and to determine whether a Chainstate is validated without basing it on inferences like&cs != &ActiveChainstate()
orGetAll().size() == 1
.The PR also tries to make assumeutxo terminology less confusing, using "current chainstate" to refer to the chainstate targeting the current network tip, and "historical chainstate" to refer to the chainstate downloading old blocks and validating the assumeutxo snapshot. It removes uses of the terms "active chainstate," "usable chainstate," "disabled chainstate," "ibd chainstate," and "snapshot chainstate" which are confusing for various reasons.