Skip to content

Conversation

jamesob
Copy link
Contributor

@jamesob jamesob commented Apr 6, 2021

Hopefully this'll make everyone happy, and clarify some chainstate locking issues going forward.

Background

There is some amount of confusion due to my choice to conflate use of cs_main between

  • its original purpose: guarding legacy data structures related to CChainState-related data (i.e. pre-ChainstateManager coverage of chainActive), and
  • its new purpose: guarding pointers to the constituent CChainState instances managed by ChainstateManager, e.g. m_ibd_chainstate, m_active_chainstate.

I initially thought that reusing cs_main would be a simplification, but it turned out to just be mostly confusing.

As a quick refresher, ChainstateManager manages two constituent CChainState instances (m_ibd_chainstate, m_snapshot_chainstate) and reveals a pointer to the one that is in active use (m_active_chainstate) per the assumeutxo semantics. As @jnewbery has pointed out, the active pointer doesn't need to be guarded by a mutex, but can be marked atomic to avoid explicit locking.

Throughout the course of deglobalizing chainstate use, @dongcarl has had to embed cs_main acquisitions in the active chainstate accessor functions (e.g. ::ChainstateActive()) to avoid the risk of unserialized access to the ChainstateManager::m_active_chainstate state. This is suboptimal because callers are almost always already holding cs_main anyway, so we just end up recursively acquiring the lock in many cases. And fundamentally, cs_main use isn't necessary for that once we decouple chainstate pointer management from cs_main.

These changes

  • make ChainstateManager::m_active_chainstate atomic to avoid requiring explicit locking (per @jnewbery's suggestion),
  • introduce ChainstateManager::m_cs_chainstates to relieve cs_main of guarding the internal chainstates managed by ChainstateManager,
  • remove the cs_main acquisitions in ::ChainstateActive() and ChainstateManager::ActiveChainstate(), which are no longer necessary, and
  • annotate CChainState::m_chain as guarded by cs_main, which was an implicit requirement that I'm just making explicit here.

jamesob added 7 commits April 6, 2021 15:35
In preparation for removing `cs_main` requirements for
ChainstateActive()/ActiveChainstate(), relax lock requirements for
g_chainman.

I have verified that all usages of `g_chainman` that are concurrency-sensitive
are usages of its members (e.g. `m_blockman`), which are already annotated. The
references to `g_chainman.m_active_chainstate` will be protected by use of
`std::atomic` in later commits.

Here are all non-`BlockManager`-related usages:

```
% rg --pcre2 --trim 'g_chainman(?!.m_blockman)'

src/net_processing.cpp
1271:assert(std::addressof(g_chainman) == std::addressof(m_chainman));

src/validation.h
994:extern ChainstateManager g_chainman;

src/init.cpp
1423:node.chainman = &g_chainman;

doc/release-notes/release-notes-0.21.0.md
577:- bitcoin#19927 Reduce direct `g_chainman` usage (dongcarl)

src/validation.cpp
105:ChainstateManager g_chainman;
109:assert(g_chainman.m_active_chainstate);
110:return *g_chainman.m_active_chainstate;
174:assert(std::addressof(g_chainman.BlockIndex()) == std::addressof(m_block_index));

src/node/interfaces.cpp
479:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman));
489:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman));
502:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman));
514:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman));
516:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman));
525:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman));
527:assert(std::addressof(g_chainman) == std::addressof(*m_node.chainman));

src/test/util/setup_common.cpp
143:m_node.chainman = &::g_chainman;

src/qt/test/apptests.cpp
89:UnloadBlockIndex(/* mempool */ nullptr, g_chainman);
90:g_chainman.Reset();
```
This will allow us to not have to worry about repurposing cs_main as
a mutex for setting this state, and will subsequently allow us to
remove additional `cs_main` acquisitions and annotations in subsequent
commits.
Atomicity of this data no longer requires explicit locking.
To avoid (confusingly) repurposing cs_main to manage internal
ChainstateManager chainstate references, introduce a specific lock for this
purpose.

It's a RecursiveMutex (and not a Mutex) to facilitate acquiring the lock
in MaybeRebalanceCaches, since we'd otherwise have to give the lock public
class visibility (to allow callers to lock it).
Which in turn requires `PruneIndexBlockCandidates()` to be annotated;
luckily no locking changes are needed there.
@jamesob jamesob force-pushed the 2021-04-cs-lock-improvements branch from 9fbb8ca to 8257e9a Compare April 6, 2021 19:36
@ryanofsky
Copy link
Contributor

ryanofsky commented Apr 6, 2021

It's great for this to be removing the recursive locks in 09c924d and 5b8587f. Also annotating CChainState::m_chain in 8257e9a makes a lot of sense.

The thing I don't understand is what bug is prevented by making m_active_chainstate std::atomic<CChainState*> instead of CChainState*? What bugs are prevented by adding LOCK(m_cs_chainstates); or having m_cs_chainstates exist at all?

Not putting EXCLUSIVE_LOCKS_REQUIRED(cs_main) on ChainstateManager methods seems like Six Flags saying congratulations, we're waiving the $100 fee to get into the park, we're just going to charge you $100 the first time you get onto any ride. It seems like this could only be appealing if you wanted to go to a CChainState theme park without trying any rides (accessing any data inside).

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 6, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jamesob
Copy link
Contributor Author

jamesob commented Apr 7, 2021

Thanks for the quick look @ryanofsky.

The thing I don't understand is what bug is prevented by making m_active_chainstate std::atomic<CChainState*> instead of CChainState*? What bugs are prevented by adding LOCK(m_cs_chainstates); or having m_cs_chainstates exist at all?

Strictly speaking, I don't think there's a difference in bugs between this approach and just repurposing cs_main to guard this stuff. But there are a number of usability/maybe-performance arguments. I think the major arguments are:

  • cs_main is a lock that's used for many different things, and eventually (I think?) we want to break it up into more granular, specified locks - though I could be wrong about that,
  • it's kind of nice not having to worry callers about any locks if indeed this std::atomic approach works in the way that I claim it does, because....
  • otherwise, if we apply a GUARDED_BY(::cs_main) to m_active_chainstate, we'll need to do awkward stuff like temporarily holding cs_main to acquire the active chainstate before we call ActivateBestChain(), since you can't go into ABC holding cs_main.

I think your Six Flags analogy is both entertaining and true (in a strict sense), but if everyone generally agrees that we eventually need to break up cs_main to avoid unnecessary lock contention, then I think these changes are a step in that direction.

But otherwise I'm happy to separate out the commits that you mention and just annotate everything with cs_main if it makes more sense for that lock to live on in its spirit of "I lock every damn thing related to chainstate," which may be easier to reason about than having three or four more granular locks running around.

But it does seem appealing to decouple, say, block storage locking from chain locking so that we can do things like prune while continuing validation. And longer term I suspect that each chainstate instance will want its own little cs_main so that we can do various chainstate operations in parallel should something like utreexo come around.

@ryanofsky
Copy link
Contributor

Will review in more detail, and I'm all for lock granularity if locks are independent. But if we're just turning one lock into two dependent locks where you acquire one lock, release it, then always acquire the other lock, it seems pointlessly complex. It increases the number of states things can be in and makes thread safety annotations less likely to catch race conditions. If there is theoretically a place where having these two locks could be better for performance, I would be curious to know where it is. The ActivateBestChain example just seems like a case of weird code organization that should be cleaned up, not a case where two locks make more sense for code as it is executed.

I think the clearest thing would be for this low level code not to have any synchronization: no locks or atomics, just plain annotations stating locking assumptions so higher level code can call it correctly. I've suggested this before #19806 (review), and it can be done separately from this PR. This PR does make other improvements, and I'm happy with all the changes, and planning to review it.

dongcarl added a commit to dongcarl/bitcoin that referenced this pull request Apr 13, 2021
In all rest/rpc-related modules, if there are multiple calls to
ActiveChain{,State}(), and the calls fall under the same ::cs_main lock,
we can simply take a local reference and use/reuse it instead of calling
ActiveChain{,State}() again and again.

Question: if we split up the lock in bitcoin#21620, wouldn't this be broken?
@dongcarl
Copy link
Contributor

I haven't read everything yet, and plan on doing it soon. Q for you: Does this mean that even if we hold ::cs_main, we won't be sure that the m_active_chainstate won't change from under us?

dongcarl added a commit to dongcarl/bitcoin that referenced this pull request Apr 13, 2021
In all rest/rpc-related modules, if there are multiple calls to
ActiveChain{,State}(), and the calls fall under the same ::cs_main lock,
we can simply take a local reference and use/reuse it instead of calling
ActiveChain{,State}() again and again.

Question: if we split up the lock in bitcoin#21620, wouldn't this be broken?
@jamesob
Copy link
Contributor Author

jamesob commented Apr 13, 2021

Good points @ryanofsky @dongcarl. After digesting this a little, I actually think that both this approach and our current approach have bugs.

Consider the case where ActivateSnapshot() is called at this moment during ProcessNewBlock(). Neither a cs_main hold nor the atomic approach here for guarding the contents of m_active_chainstate will prevent a different active chainstate from being used in the contents of PNB above this line, creating a situation where we've called AcceptBlock() on a different chainstate than the one we've called ActivateBestChain() on.

The narrow solution in this case is maybe obviously to grab a single chainstate at the top of the function (instead of calling ActiveChainstate() for each usage), but I'm trying to think about what the right arrangement of locks and annotations would be to prevent this situation in general.

@ryanofsky
Copy link
Contributor

The narrow solution in this case is maybe obviously to grab a single chainstate at the top of the function (instead of calling ActiveChainstate() for each usage)

Yes, please. This was on my unimplemented review suggestion followup list. #20749 (comment) #20749 (review) #21025 (review) "Best would be to have a CChainState* chain_state local variable that is initialized while cs_main is held above, and then used here."

but I'm trying to think about what the right arrangement of locks and annotations would be to prevent this situation in general.

It seems to me replacing synchronization with annotations in ChainstateManager (removing the locks and atomic and adding EXCLUSIVE_LOCKS_REQUIRED(cs_main) as suggested) would ensure callers use it correctly, or at least have to think about using it correctly.

@ryanofsky
Copy link
Contributor

Offline thread:

<james>: I think we need cs_active_chainstate or something that functions like PNB either acquire or are annotated with to prevent the active chainstate from being swapped out during their usage

<russ>: I'm not trying to claim adding lock annotations will prevent all Chainstatemanger callers from being buggy and wrongly assuming chainstate doesn't change when they didn't lock it. I don't think there is anything Chainstatemanager can do to prevent that unless it's going to go into the kernel and freeze all running threads. My claim is just that bugs are less likely if locking cs_main locks the chainstate and callers are forced to lock cs_main and thinking about locking when they use ChainstateManager

<james>: The kind of sad thing is that that swap-out only happens once during runtime (i.e. when ActivateSnapshot() is called) and really we could avoid a lot of this mess if we restricted snapshots to being loaded at init time

<russ>: I don't see what's that complicated about this. If you don't want the active chainstate to change, then you should lock cs_main. If you don't care whether the active chainstate changes then you can take a reference to the chainstate and not lock cs_main.

@ryanofsky
Copy link
Contributor

If there is a case where it's neccessary for code to release cs_main but prevent the active chain from changing would suggest a condition variable:

bool m_lock_chainstate = false;
std::condition_variable m_lock_chainstate_cv;

Then in the code that wants to update the chainstate:

lock_chainstate_cv.wait(cs_main, []{return !lock_chainstate;});

And the code that wants to lock the chainstate without holding cs_main:

m_lock_chainstate = true;
m_lock_chainstate_cv.notify_all();

...

m_lock_chainstate = false;
m_lock_chainstate_cv.notify_all()

@ryanofsky
Copy link
Contributor

ryanofsky commented Apr 14, 2021

Actually I guess that could be equivalent to your active_chainstate mutex if it using manual locking instead of a scoped lock. So maybe I'll arrive at the same conclusion as you from a different angle. It just seems like there are a lot of solutions here, and there shouldn't be any conflict between a good solution and one that uses compiler thread safety annotations

@jnewbery
Copy link
Contributor

I haven't looked at the code here yet, but I want to respond to a couple of the points raised in the discussion here:

Not putting EXCLUSIVE_LOCKS_REQUIRED(cs_main) on ChainstateManager methods seems like Six Flags saying congratulations, we're waiving the $100 fee to get into the park, we're just going to charge you $100 the first time you get onto any ride. It seems like this could only be appealing if you wanted to go to a CChainState theme park without trying any rides (accessing any data inside).

I think the clearest thing would be for this low level code not to have any synchronization: no locks or atomics, just plain annotations stating locking assumptions so higher level code can call it correctly.

Most of the client code (eg in net_processing, rpc, etc) shouldn't need to lock chainman in most cases. Usually it's just trying to read a single piece of data (whether we're in IBD, what the current height is, fetch a CBlockIndex pointer to the tip, etc). The higher level code shouldn't have to concern itself with locks in those cases, and in fact those operations can be made lock-free by using atomics to cache the data (as IsInitialBlockDownload() used to do, but now sadly requires cs_main in most cases). Removing cs_main locks from the rpc methods and the GUI's node/interface methods would ensure that RPC/GUI is never blocking validation from doing useful work. Reducing cs_main usage in net_processing would open the possibility to parallelizing IBD (either with multiple net_processing threads or a separate validation thread).

There are also cases where the client code does want to carry out multiple operations atomically on chainman. For those cases, I think it makes sense to expose cs_main to the caller and enforce it with annotations.

@ryanofsky
Copy link
Contributor

Most of the client code (eg in net_processing, rpc, etc) shouldn't need to lock chainman in most cases. Usually it's just trying to read a single piece of data (whether we're in IBD, what the current height is, fetch a CBlockIndex pointer to the tip, etc).

Yes, I think we are on the same page. When you just need one piece of data, this is what atomics are for and it makes sense to switch a variable from T to atomic<T> or T* to atomic<T*> so you can read it without locking. And for an API:

T GetData() EXCLUSIVE_LOCKS_REQUIRED(cs_main);
T GetCachedData();

is nice. But you can make individual variables atomic or not without affecting anything else more broadly. I do think it makes sense to delay switching a variable from T to atomic<T> until you have at least one piece of code that's reading it without a lock.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 3, 2021

🕵️ @sipa has been requested to review this pull request as specified in the REVIEWERS file.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".

@jamesob jamesob closed this Aug 5, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants