Skip to content

boost::signals2 disconnections don't block for slots to finish executing (& CValidationInterface lifecycle races) #18065

@markblundeberg

Description

@markblundeberg

I've noticed a rare use-after-scope race in submitblock while testing Bitcoin ABC built in debug mode, and the involved code is identical to Core so I believe the same concern exists here.

Essentially the issue is that UnregisterValidationInterface does disconnect the slots promptly, but the slots may be in the process of executing at that precise moment, in the scheduler thread. Even though UpdatedBlockTip is an empty function for the submitblock_StateCatcher sc, it does get wrapped by std::bind and various internal boost wrappers so there's a tiny time window to have a problem. The race looks something like this (Sched = scheduler thread, RPC = thread calling submitblock function)

  • [Sched] Invoke UpdatedBlockTip signal.
  • [Sched] Observe the connection for UpdatedBlockTip and extract the slot-to-be-executed.
  • [RPC] submitblock disconnects the connection.
  • [Sched] Start executing the slot, starting with various boost wrappers, passing through std::bind, then a call to the CValidationInterface's virtual UpdatedBlockTip method.
  • [RPC] submitblock function completes; sc is destroyed and zeroed.
  • [Sched] Look up the sc vtable entry for UpdatedBlockTip and call it. Segfault because sc's vtable pointer has been zeroed.

Boost signals2 thread safety discussion: https://www.boost.org/doc/libs/1_55_0/doc/html/signals2/thread-safety.html

Note that since we unlock the connection's mutex before executing its associated slot, it is possible a slot will still be executing after it has been disconnected by a connection::disconnect(), if the disconnect was called concurrently with signal invocation.

This was the direct fix in Bitcoin ABC however it does not address the bigger issue: 1) not all signals are queued at the scheduler and so there could be a race with a third thread, and 2) there are other places in the code base (like wallet) that seem to not wait after unregistering. I am not super familiar with signals2 so I'm not sure what is the 'correct' fix. But the TL;DR is that scoped_connection only protects the lifecycle of the std::bind and not anything deeper.

Relevant code

The scope in submitblock:

bitcoin/src/rpc/mining.cpp

Lines 781 to 792 in 651e343

submitblock_StateCatcher sc(block.GetHash());
RegisterValidationInterface(&sc);
bool accepted = ProcessNewBlock(Params(), blockptr, /* fForceProcessing */ true, /* fNewBlock */ &new_block);
UnregisterValidationInterface(&sc);
if (!new_block && accepted) {
return "duplicate";
}
if (!sc.found) {
return "inconclusive";
}
return BIP22ValidationResult(sc.state);
}

The submitblock_StateCatcher definition:

bitcoin/src/rpc/mining.cpp

Lines 713 to 729 in 651e343

class submitblock_StateCatcher : public CValidationInterface
{
public:
uint256 hash;
bool found;
BlockValidationState state;
explicit submitblock_StateCatcher(const uint256 &hashIn) : hash(hashIn), found(false), state() {}
protected:
void BlockChecked(const CBlock& block, const BlockValidationState& stateIn) override {
if (block.GetHash() != hash)
return;
found = true;
state = stateIn;
}
};

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions