-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Validation: Move CheckBlock() mutation guard to AcceptBlock() #17601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validation: Move CheckBlock() mutation guard to AcceptBlock() #17601
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
Maybe hold for follow-up but did you look to remove CheckBlockHeader
from CheckBlock
given it's called now in AcceptBlockHeader
which is call in AcceptBlock
? I think that's okay if we add a CheckBlockHeader
in TestBlockValidity
.
On the DoS-side we now have AcceptBlockHeader
called before CheckBlock
, I think that's better given block header checks are cheaper than block ones.
@@ -3769,18 +3781,9 @@ bool ProcessNewBlock(const CChainParams& chainparams, const std::shared_ptr<cons | |||
if (fNewBlock) *fNewBlock = false; | |||
BlockValidationState state; | |||
|
|||
// CheckBlock() does not support multi-threaded block validation because CBlock::fChecked can cause data race. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you keep CheckBlock as it is can we keep this to inform future refactors ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should either:
- be a comment in
CheckBlock
saying thatcs_main
should usually be held when callingCheckBlock
- add an
AssertLockHeld(cs_main)
toCheckBlock
(and also add that toFillBlock
and update the callers in bench to hold the lock) - remove
fChecked
so thatCheckBlock
doesn't need to holdcs_main
// of block mutation that cause CheckBlock() to fail; see e.g. | ||
// CVE-2012-2459 and | ||
// https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2019-February/016697.html. | ||
// Because CheckBlock() is not very expensive, the anti-DoS benefits of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You said here than not caching failure is okay because CheckBlock() isn't very expensive but at same time we use fChecked
to return early to avoid reprocessing. It seems a bit an inconsistent position. If it's cheap enough we shouldn't bother with fChecked
and lock tacking shouldn't cover CheckBlock? Or we could split CheckBlock between CheckBlockIntegrity and CheckBlockValidity and have a fDefinitelyInvalid
to skip both if see block again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fChecked
is a slightly different caching mechanism. It's stored on the CBlock
object and prevents having to do the merkle root and POW checking for the same object more than once. The CBlock
object is new each time we redownload a block, so this caching doesn't prevent us from re-checking an invalid block downloaded more than once. On the other hand, the BlockManager.m_block_index
is what prevents us from checking an invalid block downloaded a second time.
We may actually be able to remove fChecked
after this PR. Before this PR, we call CheckBlock
three times on the same block:
- from
ProcessNewBlock()
- from
AcceptBlock()
- from
ConnectBlock()
(1) is removed by this PR and (3) is not on the critical path for compact block relay (since we relay the compact block as soon as we've done the merkle tree/pow checks the first time, and before we save to disk or connect the block).
src/validation.cpp
Outdated
LOCK(cs_main); | ||
|
||
BlockValidationState state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: keep struct construction outside of cs_main, that's still a concurrency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's still a concurrency
What you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @ariard means that there's no need to construct the BlockValidationState
object within the cs_main
scope. That's true, but constructing this object is very cheap, so I don't think it's a problem (and placing the variable declaration next to where it's used make this clearer).
I'm actually going to remove this commit from the PR, since I don't think it's necessary (and may make it more likely to introduce a bug, since the callers to ProcessNewBlock()
in net_processing don't check the return code of the function before using the fNewBlock
out param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
src/validation.cpp
Outdated
LOCK(cs_main); | ||
|
||
BlockValidationState state; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's still a concurrency
What you mean?
We do not mark any blocks that fail CheckBlock() as BLOCK_FAILED_VALID since they could have been mutated and marking a valid-but-mutated block as invalid would prevent us from ever syncing to that chain. See https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2019-February/016697.html for full details. The current guard against marking CheckBlock() failed blocks as invalid is by calling CheckBlock() prior to AcceptBlock() in ProcessNewBlock(). That is brittle since AcceptBlock() has an implicit assumption that any block submitted has been checked for mutation. A future change to ProcessNewBlock() could overlook that implicit assumption and introduce a consensus failure. In this commit we move the mutation guard logic into AcceptBlock() and add comments to explain why we never mark CheckBlock() failed blocks as invalid.
c433cc4
to
eb3b20e
Compare
Rebased on latest master and removed the final commit.
As you note, if we did that, we'd need to add |
Closing this for now. I think it's the right change, but it's not high priority and there doesn't seem to be much interest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub doesn't seem to parse its own link #17601 (
|
I empathise with this sentiment and have closed my own pull requests for the same reason, combined with the growing stack of open PRs needing attention and thinking I shouldn't have too many PRs open. That said, unless a squeaky wheel calls for grease, review attention seems to be flood-or-drought. Dormant, then suddenly, unexpectedly present. So I guess what matters most is if the PR author is still interested in their own PR. |
This is consensus-critical and the changing the ordering of checks here could potentially introduce very subtle consensus failures, and so this PR requires very careful review. I have 9 other PRs open and more branches that I haven't PRed and I'd prefer to focus review attention on those. |
We do not mark any blocks that fail
CheckBlock()
asBLOCK_FAILED_VALID
since they could have been mutated and marking a valid-but-mutated block
as invalid would prevent us from ever syncing to that chain. See
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2019-February/016697.html
for full details.
The current guard against marking
CheckBlock()
failed blocks as invalidis by calling
CheckBlock()
prior toAcceptBlock()
inProcessNewBlock()
.That is brittle since
AcceptBlock()
has an implicit assumption that anyblock submitted has been checked for mutation. A future change to
ProcessNewBlock()
could overlook that implicit assumption and introducea consensus failure.
Move the mutation guard logic into
AcceptBlock()
andadd comments to explain why we never mark
CheckBlock()
failed blocks asinvalid.