Call ProcessNewBlock() asynchronously #16175

TheBlueMatt · 2019-06-09T11:51:45Z

This is essentially a rebase of #12934, though its actually greenfield to reduce diff a bit. Based on #16174 as without it the parallelism would be almost entirely useless. Note that this is draft as I haven't (yet) bothered doing the backgrounding of the block validation, its currently just a return value change and the net_processing changes to handle it.

DrahtBot · 2019-06-09T15:05:22Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#16202 (Refactor network message deserialization by jonasschnelli)
#16194 (refactor: share blockmetadata with BlockManager by jamesob)
#15921 (Tidy up ValidationState interface by jnewbery)
#15606 ([experimental] UTXO snapshots by jamesob)
#15545 ([doc] explain why CheckBlock() is called before AcceptBlock by Sjors)
#15505 ([p2p] Request NOTFOUND transactions immediately from other outbound peers, when possible by sdaftuar)
#15206 (Immediately disconnect on invalid net message checksum by jonasschnelli)
#14046 (net: Refactor message parsing (CNetMessage), adds flexibility by jonasschnelli)
#14032 (Add p2p layer encryption with ECDH/ChaCha20Poly1305 by jonasschnelli)
#9849 (Qt: Network Watch tool by luke-jr)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

TheBlueMatt · 2019-06-09T19:32:14Z

Oops, it turns out its actually not possible to do #16174 first separately (and without it, this PR is entirely useless). By itself, #16174 ends up hitting the deadly cross-validationinterface-queue deadlock - it takes cs_peerstate first, then (eventually) calls PNB, which may call LimitValidationInterfaceQueue(). Meanwhile, in the validation queue, we may end up waiting on cs_peerstate to update some peer's state. Of course, once we actually move block validation into its own thread, this is no longer (materially) a concern - we trivially guarantee there are no locks held going into LimitValidationInterfaceQueue().

TheBlueMatt · 2019-06-09T22:04:08Z

Note that there is still a regression causing some functional tests to time out as we may end up waiting 100ms to discover that a block has been processed and we can process the next message from that peer (this shouldn't be an issue in the case of downloading from multiple peers since we'll always wake on a new tip block, and seems to not be an issue for one peer, even though its possible it would be due to a wake race). Not 100% sure what the best solution to that is.

jamesob · 2019-06-11T02:38:05Z

Oops, it turns out its actually not possible to do #16174 first separately [...] Meanwhile, in the validation queue, we may end up waiting on cs_peerstate to update some peer's state.

For other reviewers: the particular validationinterface callback at fault here looks to be PeerLogicValidation::BlockChecked.

Big Concept ACK on the lock-splitting at the very least, though I do wish it could be done separately somehow. I'm a Concept-Concept ACK on this (in concept), but want to make sure I fully understand the implications.

9fdf05d resolved some lock inversion warnings in denialofservice_tests, but left in a number of cs_main locks that are unnecessary (introducing lock inversion warnings in future changes).

This prepares for making ProcessNewBlock processing actually happen in a separate thread from the caller, even though all callsites currently block on the return value immediately. Note that this makes some of the unit tests less restrictive.

CNodeState was added for validation-state-tracking, and thus, logically, was protected by cs_main. However, as it has grown to include non-validation state (taking state from CNode), and as we've reduced cs_main usage for other unrelated things, CNodeState is left with lots of cs_main locking in net_processing. In order to ease transition to something new, this adds only a dummy CPeerState which is held as a reference for the duration of message processing. Note that moving things is somewhat tricky pre validation-thread as a consistent lockorder must be kept - we can't take a lock on the new cs_peerstate in anything that's called directly from validation.

Essentially, our goal is to not process anything for the given peer until the block finishes processing (emulating the previous behavior) without actually blocking the ProcessMessages loops. Obviously, in most cases, we'll just go on to the next peer and immediately hit a cs_main lock, blocking us anyway, but this we can slowly improve that state over time by moving things from CNodeState to CPeerState.

Spawn a background thread at startup which validates each block as it comes in from ProcessNewBlock, taking advantage of the new std::future return value to keep tests simple (and the new net_processing handling of such values async already). This makes introducing subtle validationinterface deadlocks much harder as any locks held going into ProcessNewBlock do not interact with (in the form of lockorder restrictions) locks taken in validationinterface callbacks. Note that after this commit, feature_block and feature_assumevalid tests time out due to increased latency between block processing when those blocks do not represent a new best block. This will be resolved in the next commit.

This resolves the performance regression introduced in the previous commit by always waking the message processing thread after each block future resolves. Sadly, this is somewhat awkward - all other validationinterface callbacks represent an actual change to the global validation state, whereas this callback indicates only that a call which one validation "client" made has completed. After going back and forth for some time I didn't see a materially better way to resolve this issue, and luckily its a rather simple change, but its far from ideal. Note that because we absolutely do not want to ever block on a ProcessNewBlock-returned-future, the callback approach is critical.

This removes the cs_main lock which is taken on every ProcessMessages call.

practicalswift · 2019-06-13T07:52:26Z

src/validation.h

 *
 * @param[in]   pblock  The block we want to process.
 * @param[in]   fForceProcessing Process this block even if unrequested; used for non-network block sources and whitelisted peers.
- * @param[out]  fNewBlock A boolean which is set to indicate if the block was first received via this call
- * @return True if state.IsValid()
+ * @return      A future which complets with a boolean which is set to indicate if the block was first received via this call


Adding an "e" completes the spelling here :-)

ryanofsky

Started review (will update list below with progress).

ryanofsky · 2019-06-11T16:30:45Z

src/net_processing.cpp

@@ -388,7 +399,20 @@ struct CNodeState {
 // Keeps track of the time (in microseconds) when transactions were requested last time
 limitedmap<uint256, int64_t> g_already_asked_for GUARDED_BY(cs_main)(MAX_INV_SZ);

+/** Note that this must be locked BEFORE cs_main! */
+CCriticalSection cs_peerstate;


In commit "Add a new peer state tracking class to reduce cs_main contention." (e898229)

Do you want this to be a recursive mutex for some reason? If so, it'd helpful to say in an comment why recursive locking is helpful here. Otherwise I'd suggest changing this line to Mutex g_peerstate_mutex (also not using the old "critical section" terminology).

ryanofsky · 2019-06-11T16:33:45Z

src/net_processing.cpp

@@ -756,12 +780,22 @@ void PeerLogicValidation::InitializeNode(CNode *pnode) {
        LOCK(cs_main);
        mapNodeState.emplace_hint(mapNodeState.end(), std::piecewise_construct, std::forward_as_tuple(nodeid), std::forward_as_tuple(addr, std::move(addrName), pnode->fInbound, pnode->m_manual_connection));
    }
+    {
+        LOCK(cs_peerstate);
+        mapPeerState.emplace_hint(mapPeerState.end(), nodeid, CPeerState{});


In commit "Add a new peer state tracking class to reduce cs_main contention." (e898229)

Not necessarily suggesting you should change this here, but is the only reason for adding a new mapPeerState map to make the diff smaller? If entries are always added to both mapNodeState and mapPeerState maps and removed from the maps at the same times with both locks held, it would seem more ideal to have just single map where each entry contains separate node state and peer state structs (to support separate lock annotations).

ryanofsky · 2019-06-11T17:26:03Z

src/net_processing.cpp

+ * contention. Thus, new state tracking should go here, and we should eventually
+ * move most (non-validation-specific) state here.
+ */
+struct CPeerState {


In commit "Add a new peer state tracking class to reduce cs_main contention." (e898229)

Probably better to follow current coding convention and call it PeerState instead of CPeerState

ryanofsky · 2019-06-11T17:30:58Z

src/net_processing.cpp

 /** Map maintaining per-node state. */
+static std::map<NodeId, CPeerState> mapPeerState GUARDED_BY(cs_peerstate);
+
+static CPeerState *PeerState(NodeId pnode) EXCLUSIVE_LOCKS_REQUIRED(cs_peerstate) LOCKS_EXCLUDED(cs_main) {


In commit "Add a new peer state tracking class to reduce cs_main contention." (e898229)

Would suggest calling this LookupPeerState, to be consistent with LookupBlockIndex, and because otherwise calls to this function look like they are constructing objects, not looking up existing objects

ryanofsky · 2019-06-11T17:32:16Z

src/net_processing.cpp

 /** Map maintaining per-node state. */
+static std::map<NodeId, CPeerState> mapPeerState GUARDED_BY(cs_peerstate);


In commit "Add a new peer state tracking class to reduce cs_main contention." (e898229)

g_peer_states instead of mapPeerState would be more in line with current coding convention

ryanofsky · 2019-06-13T19:40:25Z

src/validation.cpp

        }
    }

+    result_promise.set_value(fNewBlock);


In commit "Make ProcessNewBlock return a future instead of an immediate bool" (79e6e44)

This line is added here and them moved below NotifyHeaderTip in a later commit 64e74ce. It would be preferable to just add in the right place below now so it doesn't need to be moved later.

ryanofsky · 2019-06-13T19:48:07Z

src/validation.cpp

    return result;
 }

+std::future<bool> ProcessNewBlock(const CChainParams& chainparams, const std::shared_ptr<const CBlock> pblock, bool fForceProcessing)


In commit "Run ProcessNewBlock in a background thread (XXX: p2p_sendheaders broken)" (64e74ce)

Documentation for this function in validation.h becomes out of date with this commit and should be updated. It still says "This only returns after the best known valid block is made active."

ryanofsky · 2019-06-13T20:01:18Z

src/validation.cpp

+            bool ret = CheckBlock(*pblock, state, chainparams.GetConsensus());
+            if (ret) {
+                // Store to disk
+                ret = ::ChainstateActive().AcceptBlock(pblock, state, chainparams, &pindex, fForceProcessing, nullptr, &fNewBlock);


In commit "Run ProcessNewBlock in a background thread (XXX: p2p_sendheaders broken)" (64e74ce)

Throughout this function shouldn't ::ChainstateActive() be replaced by *this?

ryanofsky · 2019-06-13T20:04:32Z

src/rpc/mining.cpp

-    if (!new_block && accepted) {
+    LOCK(cs_main);
+    const CBlockIndex* pindex = LookupBlockIndex(blockptr->GetHash());
+    if (!new_block && pindex && pindex->IsValid()) {


In commit "Make ProcessNewBlock return a future instead of an immediate bool" (79e6e44)

This seems fine, but just to make sure I understand correctly, there is a change in behavior here? Previously accepted would only be true if CheckBlock and AcceptBlock and ActivateBestChain all succeeded but this can be true even they fail?

ryanofsky · 2019-06-13T20:28:25Z

src/test/blockfilter_index_tests.cpp

@@ -175,7 +175,7 @@ BOOST_FIXTURE_TEST_CASE(blockfilter_index_initial_sync, TestChain100Setup)
    uint256 chainA_last_header = last_header;
    for (size_t i = 0; i < 2; i++) {
        const auto& block = chainA[i];
-        BOOST_REQUIRE(ProcessNewBlock(Params(), block, true, nullptr));


In commit "Make ProcessNewBlock return a future instead of an immediate bool" (79e6e44)

In the commit message about these tests you wrote "Note that this makes some of the unit tests less restrictive." But I don't think there's a good justification for doing this. If you just declared struct ProcessNewBlockResult { bool error = false; bool already_have = false; }; and returned std::future<ProcessNewBlockResult>, the tests could behave the same and the code would be more readable and less fragile, without the need for people to guess from context whether the bool indicates success or newness.

Think I agree with this suggestion; dropping the fNewBlock result at the same time as turning the return into a future seems unnecessarily complicated.

jamesob · 2019-06-13T20:58:14Z

When IBDing from a single peer, this branch appears to be no slower (or negligibly slower) than master. I'm going to modify the benchmark framework to test against multiple peers, which is where I'm assuming we'd expect to see some speedup.

ryanofsky · 2019-06-14T14:53:02Z

Just want to note that all my comments above are just suggested cleanups. Feel free to ignore them if they don't make sense or aren't worth effort to implement. The only thing I'd really like to see are more comments about locking. When a lock is held in a small scope for specific purpose, I don't think there's a need to have a comment, but when a lock is held over wide scope for no obvious reason it's good to have a comment that suggests why it needs to be acquired there and when it is safe to release.

TheBlueMatt · 2019-06-18T20:17:33Z

Working on rewriting this to make it simpler but it looks like it may end up growing so dunno about it.

TheBlueMatt mentioned this pull request Jun 9, 2019

Add a new peer state tracking class to reduce cs_main contention. #16174

Closed

DrahtBot added Mining P2P RPC/REST/ZMQ Tests Validation labels Jun 9, 2019

TheBlueMatt force-pushed the 2019-06-add-second-thread branch 2 times, most recently from 3e8ad42 to 5779633 Compare June 9, 2019 19:26

TheBlueMatt force-pushed the 2019-06-add-second-thread branch 4 times, most recently from a36f301 to 4eb8ebf Compare June 9, 2019 19:52

TheBlueMatt marked this pull request as ready for review June 9, 2019 20:26

TheBlueMatt force-pushed the 2019-06-add-second-thread branch from 4eb8ebf to d31e3d0 Compare June 9, 2019 20:59

TheBlueMatt added 10 commits June 12, 2019 12:52

Remove unnecessary cs_mains in denialofservice_tests

9bc8ca6

9fdf05d resolved some lock inversion warnings in denialofservice_tests, but left in a number of cs_main locks that are unnecessary (introducing lock inversion warnings in future changes).

Move BlockChecked to a background thread

3499d98

Move nDoS counters to CPeerState (and, thus, out of cs_main)

445bb84

Move rejects into cs_peerstate.

b6a0552

This removes the cs_main lock which is taken on every ProcessMessages call.

Actually parallelize validation somewhat

2d1e8f6

TheBlueMatt force-pushed the 2019-06-add-second-thread branch from 9727e5e to 2d1e8f6 Compare June 12, 2019 22:28

practicalswift reviewed Jun 13, 2019

View reviewed changes

ryanofsky reviewed Jun 13, 2019

View reviewed changes

TheBlueMatt closed this Jun 18, 2019

TheBlueMatt mentioned this pull request Jun 24, 2019

Return the AcceptBlock CValidationState directly in ProcessNewBlock #16279

Closed

TheBlueMatt mentioned this pull request Jul 2, 2019

Call ProcessNewBlock() asynchronously #16323

Closed

dongcarl mentioned this pull request May 12, 2020

[WIP] rebase: Call ProcessNewBlock() asynchronously #18963

Closed

2 tasks

bitcoin locked as resolved and limited conversation to collaborators Dec 16, 2021

		/** Map maintaining per-node state. */
		static std::map<NodeId, CPeerState> mapPeerState GUARDED_BY(cs_peerstate);

Call ProcessNewBlock() asynchronously #16175

Call ProcessNewBlock() asynchronously #16175

Uh oh!

Conversation

TheBlueMatt commented Jun 9, 2019

Uh oh!

DrahtBot commented Jun 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

TheBlueMatt commented Jun 9, 2019

Uh oh!

TheBlueMatt commented Jun 9, 2019

Uh oh!

jamesob commented Jun 11, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanofsky left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jamesob commented Jun 13, 2019

Uh oh!

ryanofsky commented Jun 14, 2019

Uh oh!

TheBlueMatt commented Jun 18, 2019

Uh oh!

Uh oh!

DrahtBot commented Jun 9, 2019 •

edited

Loading