ChainstateManager locking improvements #21620

jamesob · 2021-04-06T19:12:27Z

Hopefully this'll make everyone happy, and clarify some chainstate locking issues going forward.

Background

There is some amount of confusion due to my choice to conflate use of cs_main between

its original purpose: guarding legacy data structures related to CChainState-related data (i.e. pre-ChainstateManager coverage of chainActive), and
its new purpose: guarding pointers to the constituent CChainState instances managed by ChainstateManager, e.g. m_ibd_chainstate, m_active_chainstate.

I initially thought that reusing cs_main would be a simplification, but it turned out to just be mostly confusing.

As a quick refresher, ChainstateManager manages two constituent CChainState instances (m_ibd_chainstate, m_snapshot_chainstate) and reveals a pointer to the one that is in active use (m_active_chainstate) per the assumeutxo semantics. As @jnewbery has pointed out, the active pointer doesn't need to be guarded by a mutex, but can be marked atomic to avoid explicit locking.

Throughout the course of deglobalizing chainstate use, @dongcarl has had to embed cs_main acquisitions in the active chainstate accessor functions (e.g. ::ChainstateActive()) to avoid the risk of unserialized access to the ChainstateManager::m_active_chainstate state. This is suboptimal because callers are almost always already holding cs_main anyway, so we just end up recursively acquiring the lock in many cases. And fundamentally, cs_main use isn't necessary for that once we decouple chainstate pointer management from cs_main.

These changes

make ChainstateManager::m_active_chainstate atomic to avoid requiring explicit locking (per @jnewbery's suggestion),
introduce ChainstateManager::m_cs_chainstates to relieve cs_main of guarding the internal chainstates managed by ChainstateManager,
remove the cs_main acquisitions in ::ChainstateActive() and ChainstateManager::ActiveChainstate(), which are no longer necessary, and
annotate CChainState::m_chain as guarded by cs_main, which was an implicit requirement that I'm just making explicit here.

In preparation for removing `cs_main` requirements for ChainstateActive()/ActiveChainstate(), relax lock requirements for g_chainman. I have verified that all usages of `g_chainman` that are concurrency-sensitive are usages of its members (e.g. `m_blockman`), which are already annotated. The references to `g_chainman.m_active_chainstate` will be protected by use of `std::atomic` in later commits. Here are all non-`BlockManager`-related usages: ``` % rg --pcre2 --trim 'g_chainman(?!.m_blockman)' src/net_processing.cpp 1271:assert(std::addressof(g_chainman) == std::addressof(m_chainman)); src/validation.h 994:extern ChainstateManager g_chainman; src/init.cpp 1423:node.chainman = &g_chainman; doc/release-notes/release-notes-0.21.0.md 577:- bitcoin#19927 Reduce direct `g_chainman` usage (dongcarl) src/validation.cpp 105:ChainstateManager g_chainman; 109:assert(g_chainman.m_active_chainstate); 110:return *g_chainman.m_active_chainstate; 174:assert(std::addressof(g_chainman.BlockIndex()) == std::addressof(m_block_index)); src/node/interfaces.cpp 479:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 489:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 502:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 514:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 516:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 525:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); 527:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman)); src/test/util/setup_common.cpp 143:m_node.chainman = &::g_chainman; src/qt/test/apptests.cpp 89:UnloadBlockIndex(/* mempool */ nullptr, g_chainman); 90:g_chainman.Reset(); ```

This will allow us to not have to worry about repurposing cs_main as a mutex for setting this state, and will subsequently allow us to remove additional `cs_main` acquisitions and annotations in subsequent commits.

Atomicity of this data no longer requires explicit locking.

To avoid (confusingly) repurposing cs_main to manage internal ChainstateManager chainstate references, introduce a specific lock for this purpose. It's a RecursiveMutex (and not a Mutex) to facilitate acquiring the lock in MaybeRebalanceCaches, since we'd otherwise have to give the lock public class visibility (to allow callers to lock it).

Which in turn requires `PruneIndexBlockCandidates()` to be annotated; luckily no locking changes are needed there.

ryanofsky · 2021-04-06T20:19:11Z

It's great for this to be removing the recursive locks in 09c924d and 5b8587f. Also annotating CChainState::m_chain in 8257e9a makes a lot of sense.

The thing I don't understand is what bug is prevented by making m_active_chainstate std::atomic<CChainState*> instead of CChainState*? What bugs are prevented by adding LOCK(m_cs_chainstates); or having m_cs_chainstates exist at all?

Not putting EXCLUSIVE_LOCKS_REQUIRED(cs_main) on ChainstateManager methods seems like Six Flags saying congratulations, we're waiving the $100 fee to get into the park, we're just going to charge you $100 the first time you get onto any ride. It seems like this could only be appealing if you wanted to go to a CChainState theme park without trying any rides (accessing any data inside).

DrahtBot · 2021-04-06T23:51:22Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

refactor: Remove ::Params() global from CChainState #21789 (refactor: Remove ::Params() global from CChainState by MarcoFalke)
Fix assumeutxo crash due to invalid base_blockhash #21584 (Fix assumeutxo crash due to invalid base_blockhash by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

jamesob · 2021-04-07T02:08:46Z

Thanks for the quick look @ryanofsky.

The thing I don't understand is what bug is prevented by making m_active_chainstate std::atomic<CChainState*> instead of CChainState*? What bugs are prevented by adding LOCK(m_cs_chainstates); or having m_cs_chainstates exist at all?

Strictly speaking, I don't think there's a difference in bugs between this approach and just repurposing cs_main to guard this stuff. But there are a number of usability/maybe-performance arguments. I think the major arguments are:

cs_main is a lock that's used for many different things, and eventually (I think?) we want to break it up into more granular, specified locks - though I could be wrong about that,
it's kind of nice not having to worry callers about any locks if indeed this std::atomic approach works in the way that I claim it does, because....
otherwise, if we apply a GUARDED_BY(::cs_main) to m_active_chainstate, we'll need to do awkward stuff like temporarily holding cs_main to acquire the active chainstate before we call ActivateBestChain(), since you can't go into ABC holding cs_main.

I think your Six Flags analogy is both entertaining and true (in a strict sense), but if everyone generally agrees that we eventually need to break up cs_main to avoid unnecessary lock contention, then I think these changes are a step in that direction.

But otherwise I'm happy to separate out the commits that you mention and just annotate everything with cs_main if it makes more sense for that lock to live on in its spirit of "I lock every damn thing related to chainstate," which may be easier to reason about than having three or four more granular locks running around.

But it does seem appealing to decouple, say, block storage locking from chain locking so that we can do things like prune while continuing validation. And longer term I suspect that each chainstate instance will want its own little cs_main so that we can do various chainstate operations in parallel should something like utreexo come around.

ryanofsky · 2021-04-08T21:28:45Z

Will review in more detail, and I'm all for lock granularity if locks are independent. But if we're just turning one lock into two dependent locks where you acquire one lock, release it, then always acquire the other lock, it seems pointlessly complex. It increases the number of states things can be in and makes thread safety annotations less likely to catch race conditions. If there is theoretically a place where having these two locks could be better for performance, I would be curious to know where it is. The ActivateBestChain example just seems like a case of weird code organization that should be cleaned up, not a case where two locks make more sense for code as it is executed.

I think the clearest thing would be for this low level code not to have any synchronization: no locks or atomics, just plain annotations stating locking assumptions so higher level code can call it correctly. I've suggested this before #19806 (review), and it can be done separately from this PR. This PR does make other improvements, and I'm happy with all the changes, and planning to review it.

In all rest/rpc-related modules, if there are multiple calls to ActiveChain{,State}(), and the calls fall under the same ::cs_main lock, we can simply take a local reference and use/reuse it instead of calling ActiveChain{,State}() again and again. Question: if we split up the lock in bitcoin#21620, wouldn't this be broken?

dongcarl · 2021-04-13T01:51:21Z

I haven't read everything yet, and plan on doing it soon. Q for you: Does this mean that even if we hold ::cs_main, we won't be sure that the m_active_chainstate won't change from under us?

In all rest/rpc-related modules, if there are multiple calls to ActiveChain{,State}(), and the calls fall under the same ::cs_main lock, we can simply take a local reference and use/reuse it instead of calling ActiveChain{,State}() again and again. Question: if we split up the lock in bitcoin#21620, wouldn't this be broken?

jamesob · 2021-04-13T15:12:03Z

Good points @ryanofsky @dongcarl. After digesting this a little, I actually think that both this approach and our current approach have bugs.

Consider the case where ActivateSnapshot() is called at this moment during ProcessNewBlock(). Neither a cs_main hold nor the atomic approach here for guarding the contents of m_active_chainstate will prevent a different active chainstate from being used in the contents of PNB above this line, creating a situation where we've called AcceptBlock() on a different chainstate than the one we've called ActivateBestChain() on.

The narrow solution in this case is maybe obviously to grab a single chainstate at the top of the function (instead of calling ActiveChainstate() for each usage), but I'm trying to think about what the right arrangement of locks and annotations would be to prevent this situation in general.

ryanofsky · 2021-04-13T15:40:50Z

The narrow solution in this case is maybe obviously to grab a single chainstate at the top of the function (instead of calling ActiveChainstate() for each usage)

Yes, please. This was on my unimplemented review suggestion followup list. #20749 (comment) #20749 (review) #21025 (review) "Best would be to have a CChainState* chain_state local variable that is initialized while cs_main is held above, and then used here."

but I'm trying to think about what the right arrangement of locks and annotations would be to prevent this situation in general.

It seems to me replacing synchronization with annotations in ChainstateManager (removing the locks and atomic and adding EXCLUSIVE_LOCKS_REQUIRED(cs_main) as suggested) would ensure callers use it correctly, or at least have to think about using it correctly.

ryanofsky · 2021-04-14T17:05:07Z

Offline thread:

<james>: I think we need cs_active_chainstate or something that functions like PNB either acquire or are annotated with to prevent the active chainstate from being swapped out during their usage

<russ>: I'm not trying to claim adding lock annotations will prevent all Chainstatemanger callers from being buggy and wrongly assuming chainstate doesn't change when they didn't lock it. I don't think there is anything Chainstatemanager can do to prevent that unless it's going to go into the kernel and freeze all running threads. My claim is just that bugs are less likely if locking cs_main locks the chainstate and callers are forced to lock cs_main and thinking about locking when they use ChainstateManager

<james>: The kind of sad thing is that that swap-out only happens once during runtime (i.e. when ActivateSnapshot() is called) and really we could avoid a lot of this mess if we restricted snapshots to being loaded at init time

<russ>: I don't see what's that complicated about this. If you don't want the active chainstate to change, then you should lock cs_main. If you don't care whether the active chainstate changes then you can take a reference to the chainstate and not lock cs_main.

ryanofsky · 2021-04-14T17:11:11Z

If there is a case where it's neccessary for code to release cs_main but prevent the active chain from changing would suggest a condition variable:

bool m_lock_chainstate = false;
std::condition_variable m_lock_chainstate_cv;

Then in the code that wants to update the chainstate:

lock_chainstate_cv.wait(cs_main, []{return !lock_chainstate;});

And the code that wants to lock the chainstate without holding cs_main:

m_lock_chainstate = true;
m_lock_chainstate_cv.notify_all();

...

m_lock_chainstate = false;
m_lock_chainstate_cv.notify_all()

ryanofsky · 2021-04-14T17:20:42Z

Actually I guess that could be equivalent to your active_chainstate mutex if it using manual locking instead of a scoped lock. So maybe I'll arrive at the same conclusion as you from a different angle. It just seems like there are a lot of solutions here, and there shouldn't be any conflict between a good solution and one that uses compiler thread safety annotations

jnewbery · 2021-04-15T10:53:42Z

I haven't looked at the code here yet, but I want to respond to a couple of the points raised in the discussion here:

Not putting EXCLUSIVE_LOCKS_REQUIRED(cs_main) on ChainstateManager methods seems like Six Flags saying congratulations, we're waiving the $100 fee to get into the park, we're just going to charge you $100 the first time you get onto any ride. It seems like this could only be appealing if you wanted to go to a CChainState theme park without trying any rides (accessing any data inside).

I think the clearest thing would be for this low level code not to have any synchronization: no locks or atomics, just plain annotations stating locking assumptions so higher level code can call it correctly.

Most of the client code (eg in net_processing, rpc, etc) shouldn't need to lock chainman in most cases. Usually it's just trying to read a single piece of data (whether we're in IBD, what the current height is, fetch a CBlockIndex pointer to the tip, etc). The higher level code shouldn't have to concern itself with locks in those cases, and in fact those operations can be made lock-free by using atomics to cache the data (as IsInitialBlockDownload() used to do, but now sadly requires cs_main in most cases). Removing cs_main locks from the rpc methods and the GUI's node/interface methods would ensure that RPC/GUI is never blocking validation from doing useful work. Reducing cs_main usage in net_processing would open the possibility to parallelizing IBD (either with multiple net_processing threads or a separate validation thread).

There are also cases where the client code does want to carry out multiple operations atomically on chainman. For those cases, I think it makes sense to expose cs_main to the caller and enforce it with annotations.

ryanofsky · 2021-04-15T12:09:14Z

Most of the client code (eg in net_processing, rpc, etc) shouldn't need to lock chainman in most cases. Usually it's just trying to read a single piece of data (whether we're in IBD, what the current height is, fetch a CBlockIndex pointer to the tip, etc).

Yes, I think we are on the same page. When you just need one piece of data, this is what atomics are for and it makes sense to switch a variable from T to atomic<T> or T* to atomic<T*> so you can read it without locking. And for an API:

T GetData() EXCLUSIVE_LOCKS_REQUIRED(cs_main);
T GetCachedData();

is nice. But you can make individual variables atomic or not without affecting anything else more broadly. I do think it makes sense to delay switching a variable from T to atomic<T> until you have at least one piece of code that's reading it without a lock.

DrahtBot · 2021-05-03T09:32:25Z

🕵️ @sipa has been requested to review this pull request as specified in the REVIEWERS file.

DrahtBot · 2021-05-12T20:46:01Z

🐙 This pull request conflicts with the target branch and needs rebase.

_{Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".}

jamesob added 7 commits April 6, 2021 15:35

validation: make ChainstateManager.m_active_chainstate atomic

3d94962

This will allow us to not have to worry about repurposing cs_main as a mutex for setting this state, and will subsequently allow us to remove additional `cs_main` acquisitions and annotations in subsequent commits.

validation: remove cs_main requirement from m_active_chainstate

09c924d

Atomicity of this data no longer requires explicit locking.

doc: fix outdated ChainstateManager doc

98480e2

doc: note about InitializeChainstate() cs_main use

7ef1032

validation: annotate m_chain with cs_main

8257e9a

Which in turn requires `PruneIndexBlockCandidates()` to be annotated; luckily no locking changes are needed there.

jamesob force-pushed the 2021-04-cs-lock-improvements branch from 9fbb8ca to 8257e9a Compare April 6, 2021 19:36

DrahtBot added the Validation label Apr 6, 2021

DrahtBot mentioned this pull request Apr 7, 2021

Fix assumeutxo crash due to invalid base_blockhash #21584

Merged

dongcarl mentioned this pull request Apr 13, 2021

[Bundle 5/n] Prune g_chainman usage in RPC modules #21391

Merged

2 tasks

DrahtBot mentioned this pull request Apr 28, 2021

refactor: Remove ::Params() global from CChainState #21789

Merged

DrahtBot mentioned this pull request May 6, 2021

[Bundle 7/7] validation: Farewell, global Chainstate! #21866

Merged

DrahtBot added the Needs rebase label May 12, 2021

jamesob closed this Aug 5, 2021

bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ChainstateManager locking improvements #21620

ChainstateManager locking improvements #21620

Uh oh!

jamesob commented Apr 6, 2021

Uh oh!

ryanofsky commented Apr 6, 2021 •

edited

Loading

Uh oh!

DrahtBot commented Apr 6, 2021 •

edited

Loading

Uh oh!

jamesob commented Apr 7, 2021

Uh oh!

ryanofsky commented Apr 8, 2021

Uh oh!

dongcarl commented Apr 13, 2021

Uh oh!

jamesob commented Apr 13, 2021

Uh oh!

ryanofsky commented Apr 13, 2021

Uh oh!

ryanofsky commented Apr 14, 2021

Uh oh!

ryanofsky commented Apr 14, 2021

Uh oh!

ryanofsky commented Apr 14, 2021 •

edited

Loading

Uh oh!

jnewbery commented Apr 15, 2021

Uh oh!

ryanofsky commented Apr 15, 2021

Uh oh!

DrahtBot commented May 3, 2021

Uh oh!

DrahtBot commented May 12, 2021

Uh oh!

Uh oh!

ChainstateManager locking improvements #21620

ChainstateManager locking improvements #21620

Uh oh!

Conversation

jamesob commented Apr 6, 2021

Background

These changes

Uh oh!

ryanofsky commented Apr 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Apr 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

jamesob commented Apr 7, 2021

Uh oh!

ryanofsky commented Apr 8, 2021

Uh oh!

dongcarl commented Apr 13, 2021

Uh oh!

jamesob commented Apr 13, 2021

Uh oh!

ryanofsky commented Apr 13, 2021

Uh oh!

ryanofsky commented Apr 14, 2021

Uh oh!

ryanofsky commented Apr 14, 2021

Uh oh!

ryanofsky commented Apr 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnewbery commented Apr 15, 2021

Uh oh!

ryanofsky commented Apr 15, 2021

Uh oh!

DrahtBot commented May 3, 2021

Uh oh!

DrahtBot commented May 12, 2021

Uh oh!

Uh oh!

ryanofsky commented Apr 6, 2021 •

edited

Loading

DrahtBot commented Apr 6, 2021 •

edited

Loading

ryanofsky commented Apr 14, 2021 •

edited

Loading