-
Notifications
You must be signed in to change notification settings - Fork 37.7k
ChainstateManager locking improvements #21620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In preparation for removing `cs_main` requirements for ChainstateActive()/ActiveChainstate(), relax lock requirements for g_chainman. I have verified that all usages of `g_chainman` that are concurrency-sensitive are usages of its members (e.g. `m_blockman`), which are already annotated. The references to `g_chainman.m_active_chainstate` will be protected by use of `std::atomic` in later commits. Here are all non-`BlockManager`-related usages: ``` % rg --pcre2 --trim 'g_chainman(?!.m_blockman)' src/net_processing.cpp 1271:assert(std::addressof(g_chainman) == std::addressof(m_chainman)); src/validation.h 994:extern ChainstateManager g_chainman; src/init.cpp 1423:node.chainman = &g_chainman; doc/release-notes/release-notes-0.21.0.md 577:- bitcoin#19927 Reduce direct `g_chainman` usage (dongcarl) src/validation.cpp 105:ChainstateManager g_chainman; 109:assert(g_chainman.m_active_chainstate); 110:return *g_chainman.m_active_chainstate; 174:assert(std::addressof(g_chainman.BlockIndex()) == std::addressof(m_block_index)); src/node/interfaces.cpp 479:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 489:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 502:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 514:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 516:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 525:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 527:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); src/test/util/setup_common.cpp 143:m_node.chainman = &::g_chainman; src/qt/test/apptests.cpp 89:UnloadBlockIndex(/* mempool */ nullptr, g_chainman); 90:g_chainman.Reset(); ```
This will allow us to not have to worry about repurposing cs_main as a mutex for setting this state, and will subsequently allow us to remove additional `cs_main` acquisitions and annotations in subsequent commits.
Atomicity of this data no longer requires explicit locking.
To avoid (confusingly) repurposing cs_main to manage internal ChainstateManager chainstate references, introduce a specific lock for this purpose. It's a RecursiveMutex (and not a Mutex) to facilitate acquiring the lock in MaybeRebalanceCaches, since we'd otherwise have to give the lock public class visibility (to allow callers to lock it).
Which in turn requires `PruneIndexBlockCandidates()` to be annotated; luckily no locking changes are needed there.
9fbb8ca
to
8257e9a
Compare
It's great for this to be removing the recursive locks in 09c924d and 5b8587f. Also annotating CChainState::m_chain in 8257e9a makes a lot of sense. The thing I don't understand is what bug is prevented by making Not putting EXCLUSIVE_LOCKS_REQUIRED(cs_main) on ChainstateManager methods seems like Six Flags saying congratulations, we're waiving the $100 fee to get into the park, we're just going to charge you $100 the first time you get onto any ride. It seems like this could only be appealing if you wanted to go to a |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Thanks for the quick look @ryanofsky.
Strictly speaking, I don't think there's a difference in bugs between this approach and just repurposing
I think your Six Flags analogy is both entertaining and true (in a strict sense), but if everyone generally agrees that we eventually need to break up But otherwise I'm happy to separate out the commits that you mention and just annotate everything with But it does seem appealing to decouple, say, block storage locking from chain locking so that we can do things like prune while continuing validation. And longer term I suspect that each chainstate instance will want its own little cs_main so that we can do various chainstate operations in parallel should something like utreexo come around. |
Will review in more detail, and I'm all for lock granularity if locks are independent. But if we're just turning one lock into two dependent locks where you acquire one lock, release it, then always acquire the other lock, it seems pointlessly complex. It increases the number of states things can be in and makes thread safety annotations less likely to catch race conditions. If there is theoretically a place where having these two locks could be better for performance, I would be curious to know where it is. The ActivateBestChain example just seems like a case of weird code organization that should be cleaned up, not a case where two locks make more sense for code as it is executed. I think the clearest thing would be for this low level code not to have any synchronization: no locks or atomics, just plain annotations stating locking assumptions so higher level code can call it correctly. I've suggested this before #19806 (review), and it can be done separately from this PR. This PR does make other improvements, and I'm happy with all the changes, and planning to review it. |
In all rest/rpc-related modules, if there are multiple calls to ActiveChain{,State}(), and the calls fall under the same ::cs_main lock, we can simply take a local reference and use/reuse it instead of calling ActiveChain{,State}() again and again. Question: if we split up the lock in bitcoin#21620, wouldn't this be broken?
I haven't read everything yet, and plan on doing it soon. Q for you: Does this mean that even if we hold |
In all rest/rpc-related modules, if there are multiple calls to ActiveChain{,State}(), and the calls fall under the same ::cs_main lock, we can simply take a local reference and use/reuse it instead of calling ActiveChain{,State}() again and again. Question: if we split up the lock in bitcoin#21620, wouldn't this be broken?
Good points @ryanofsky @dongcarl. After digesting this a little, I actually think that both this approach and our current approach have bugs. Consider the case where The narrow solution in this case is maybe obviously to grab a single chainstate at the top of the function (instead of |
Yes, please. This was on my unimplemented review suggestion followup list. #20749 (comment) #20749 (review) #21025 (review) "Best would be to have a CChainState* chain_state local variable that is initialized while cs_main is held above, and then used here."
It seems to me replacing synchronization with annotations in ChainstateManager (removing the locks and atomic and adding EXCLUSIVE_LOCKS_REQUIRED(cs_main) as suggested) would ensure callers use it correctly, or at least have to think about using it correctly. |
Offline thread: <james>: I think we need cs_active_chainstate or something that functions like PNB either acquire or are annotated with to prevent the active chainstate from being swapped out during their usage <russ>: I'm not trying to claim adding lock annotations will prevent all Chainstatemanger callers from being buggy and wrongly assuming chainstate doesn't change when they didn't lock it. I don't think there is anything Chainstatemanager can do to prevent that unless it's going to go into the kernel and freeze all running threads. My claim is just that bugs are less likely if locking cs_main locks the chainstate and callers are forced to lock cs_main and thinking about locking when they use ChainstateManager <james>: The kind of sad thing is that that swap-out only happens once during runtime (i.e. when ActivateSnapshot() is called) and really we could avoid a lot of this mess if we restricted snapshots to being loaded at init time <russ>: I don't see what's that complicated about this. If you don't want the active chainstate to change, then you should lock cs_main. If you don't care whether the active chainstate changes then you can take a reference to the chainstate and not lock cs_main. |
If there is a case where it's neccessary for code to release cs_main but prevent the active chain from changing would suggest a condition variable: bool m_lock_chainstate = false;
std::condition_variable m_lock_chainstate_cv; Then in the code that wants to update the chainstate: lock_chainstate_cv.wait(cs_main, []{return !lock_chainstate;}); And the code that wants to lock the chainstate without holding cs_main: m_lock_chainstate = true;
m_lock_chainstate_cv.notify_all();
...
m_lock_chainstate = false;
m_lock_chainstate_cv.notify_all() |
Actually I guess that could be equivalent to your active_chainstate mutex if it using manual locking instead of a scoped lock. So maybe I'll arrive at the same conclusion as you from a different angle. It just seems like there are a lot of solutions here, and there shouldn't be any conflict between a good solution and one that uses compiler thread safety annotations |
I haven't looked at the code here yet, but I want to respond to a couple of the points raised in the discussion here:
Most of the client code (eg in net_processing, rpc, etc) shouldn't need to lock chainman in most cases. Usually it's just trying to read a single piece of data (whether we're in IBD, what the current height is, fetch a CBlockIndex pointer to the tip, etc). The higher level code shouldn't have to concern itself with locks in those cases, and in fact those operations can be made lock-free by using atomics to cache the data (as There are also cases where the client code does want to carry out multiple operations atomically on chainman. For those cases, I think it makes sense to expose |
Yes, I think we are on the same page. When you just need one piece of data, this is what atomics are for and it makes sense to switch a variable from T GetData() EXCLUSIVE_LOCKS_REQUIRED(cs_main);
T GetCachedData(); is nice. But you can make individual variables atomic or not without affecting anything else more broadly. I do think it makes sense to delay switching a variable from |
🕵️ @sipa has been requested to review this pull request as specified in the REVIEWERS file. |
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
Hopefully this'll make everyone happy, and clarify some chainstate locking issues going forward.
Background
There is some amount of confusion due to my choice to conflate use of
cs_main
betweenChainstateManager
coverage ofchainActive
), andm_ibd_chainstate
,m_active_chainstate
.I initially thought that reusing cs_main would be a simplification, but it turned out to just be mostly confusing.
As a quick refresher, ChainstateManager manages two constituent CChainState instances (
m_ibd_chainstate
,m_snapshot_chainstate
) and reveals a pointer to the one that is in active use (m_active_chainstate
) per the assumeutxo semantics. As @jnewbery has pointed out, the active pointer doesn't need to be guarded by a mutex, but can be markedatomic
to avoid explicit locking.Throughout the course of deglobalizing chainstate use, @dongcarl has had to embed
cs_main
acquisitions in the active chainstate accessor functions (e.g. ::ChainstateActive()) to avoid the risk of unserialized access to theChainstateManager::m_active_chainstate
state. This is suboptimal because callers are almost always already holding cs_main anyway, so we just end up recursively acquiring the lock in many cases. And fundamentally, cs_main use isn't necessary for that once we decouple chainstate pointer management from cs_main.These changes
ChainstateManager::m_active_chainstate
atomic to avoid requiring explicit locking (per @jnewbery's suggestion),ChainstateManager::m_cs_chainstates
to relieve cs_main of guarding the internal chainstates managed by ChainstateManager,cs_main
acquisitions in::ChainstateActive()
andChainstateManager::ActiveChainstate()
, which are no longer necessary, andCChainState::m_chain
as guarded by cs_main, which was an implicit requirement that I'm just making explicit here.