-
Notifications
You must be signed in to change notification settings - Fork 37.7k
p2p: Avoid multiple getheaders messages in flight to the same peer #25454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ajtowns This should fix the issue you've seen in functional tests, where generating a bunch of blocks at once can result in O(n^2) headers downloads. |
ccda6c1
to
f757d2a
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK, great catch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
f757d2a
to
f2e3f6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
f2e3f6d
to
af850df
Compare
af850df
to
f54b86a
Compare
f54b86a
to
e3b2185
Compare
97ced75
to
7f75eba
Compare
Also moves the call to happen directly after validation of a headers message (rather than mixed in with other state updates for the peer), and removes an incorrect comment in favor of one that explains why headers sync must continue from the last header a peer has sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 99f4785
Nits are very non-blocking
* We'll send a getheaders message in response to try to connect the chain. | ||
* | ||
* The peer can send up to MAX_UNCONNECTING_HEADERS in a row that | ||
* don't connect before given DoS points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, typo: "before being given DoS points" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done as part of #25555
@@ -82,7 +82,7 @@ def run_test(self): | |||
msg.hashstop = 0 | |||
peer.send_and_ping(msg) | |||
time.sleep(5) | |||
assert "headers" not in peer.last_message | |||
assert ("headers" not in peer.last_message or len(peer.last_message["headers"].headers) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: braces are unnecessary, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done as part of 8efa73e
Code review ACK 99f4785 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK 99f4785 Good improvement in the code.
Tested on signet.
Code Review ACK 99f4785 |
…t to the same peer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some questions
// could be benign. | ||
HandleFewUnconnectingHeaders(pfrom, peer, headers); | ||
} else { | ||
Misbehaving(peer, 10, "invalid header received"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7f24508 It might be good to mention in the commit message that you are changing the points in this "move" commit.
Previously a non-connecting non-continuous header sequence of size MAX_BLOCKS_TO_ANNOUNCE+1 was assigned 20. Now it is assigned 10, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still gets a score of 20 via HandleFewUnconnectingHeaders()
when nCount <= MAX_BLOCKS_TO_ANNOUNCE
; this adds an unavoidable score of 10 for unconnecting headers when nCount > MAX_BLOCKS_TO_ANNOUNCE
which previously was more or less ignored...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously a non-connecting non-continuous header sequence of size MAX_UNCONNECTING_HEADERS would be submitted to validation through ProcessNewBlockHeaders()
, since the if (!m_chainman.m_blockman.LookupBlockIndex(headers[0].hashPrevBlock) && nCount <= MAX_BLOCKS_TO_ANNOUNCE)
block would be skipped.
That would return BlockValidationResult::BLOCK_MISSING_PREV
:
Lines 3590 to 3593 in 691a087
if (mi == m_blockman.m_block_index.end()) { | |
LogPrint(BCLog::VALIDATION, "%s: %s prev block not found\n", __func__, hash.ToString()); | |
return state.Invalid(BlockValidationResult::BLOCK_MISSING_PREV, "prev-blk-not-found"); | |
} |
which would result in 10 misbehaviour points in MaybePunishNodeForBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this adds an unavoidable score of 10 for unconnecting headers when nCount > MAX_BLOCKS_TO_ANNOUNCE which previously was more or less ignored...
Can you explain a bit more why this was "more or less ignored"?
Previously it was deterministically and consistently assigned a Misbehaving(peer, 20, "non-continuous headers sequence");
.
Now it is deterministically and consistently assigned a Misbehaving(peer, 10, "invalid header received");
.
Previously a non-connecting non-continuous header sequence of size MAX_UNCONNECTING_HEADERS would be submitted to validation through ProcessNewBlockHeaders()
No, it wouldn't, as it would be rejected by the Misbehaving(peer, 20, "non-continuous headers sequence");
check?
What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What am I missing?
Nothing. I misread non-continuous as continuous. You're right that this would be rejected with Misbehaving(peer, 20, "non-continuous headers sequence")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what to say here other than that yes, there are minor behavior changes in this PR. I don't think any are significant as the DoS points here are pretty arbitrary and when checking for different failures, it shouldn't matter too much which order we do things. I would have been happy to update the commit message as you suggest but now that this is merged there's not much I can do!
It should be the case that any failure before that resulted in DoS points would still result in DoS points now, I think -- if that is not the case, then that would be an oversight to fix. The motivation for making these kinds of minor changes is to make the headers processing easier to reason about, which is also in preparation for another change I'll be proposing to headers sync in the future (which I think will be easier to understand with the headers processing logic that is introduced in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it wasn't meant as a criticism of 7f24508 more as a nit to keep in mind in the future. I think the code changes in this commit are perfectly fine. Generally I just prefer to split behaviour changes from refactoring changes, or at least explain the behaviour changes in the commit message briefly. Otherwise, what seems obvious to the author is easily missed by reviewers. (I think this discussion is supporting evidence enough).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously a non-connecting non-continuous header sequence of size MAX_BLOCKS_TO_ANNOUNCE+1 was assigned 20. Now it is assigned 10, no?
Previously it was deterministically and consistently assigned aMisbehaving(peer, 20, "non-continuous headers sequence");
.
Ah, sorry, I was only thinking of the non-connecting continuous case, not the non-connecting, non-continuous case.
@@ -358,7 +358,7 @@ struct Peer { | |||
std::deque<CInv> m_getdata_requests GUARDED_BY(m_getdata_requests_mutex); | |||
|
|||
/** Time of the last getheaders message to this peer */ | |||
std::atomic<std::chrono::seconds> m_last_getheaders_timestamp{0s}; | |||
std::atomic<NodeClock::time_point> m_last_getheaders_timestamp{NodeSeconds{}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there even a need to make this atomic? In a single threaded environment this is not needed. And if the same peer is served by several threads (for whatever reason) in the future, an atomic may prevent UB, but doesn't seem sufficient to prevent races logically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that is an interesting point. My sense was that we don't want to rely on implicit single-threadedness to reason about data races, so it was safer to just throw this into an atomic and not worry about it.
I also think that having multiple threads servicing the same peer would not really work for a bunch of reasons (including the one you give, about logical errors in code like this). I would imagine that multithreading network handling in the future would involve different threads servicing different peers.
I think the most likely form of UB from data races would be if we were to expose this value via RPC in the future. So to make this future-proof, I'd prefer that we leave this as-is; it doesn't seem like the performance hit of having an atomic here (which is rarely accessed, as getheaders
messages are not a frequent occurrence, and there are far more effective ways to CPU DoS a node than to trigger a getheaders) is significant enough to warrant making this harder to reason about in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if this was ever exposed on RPC, we'd also want to think which other fields to return atomically in the response, in which case a per-field atomic isn't enough to prevent logic races. So to me it seems slightly preferable to leave the design of thread safety to when it is needed. Otherwise it may be assumed in the future that this is already perfectly thread safe and can be used as is. Leaving this plain (without atomic) would at least have a thread sanitizer failure hint that the design needs to be re-thought if this was accessed by more than one thread in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Other than that, agree that atomic doesn't affect performance)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, I think you're right that this atomic is incorrect. We need to hold a lock throughout MaybeSendGetHeaders()
because this variable can be accessed in the scheduler thread or in the net processing thread, and if we don't hold a lock throughout that function we can get a data race. Thanks for asking about this, I'll open a PR to fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in-thread" means running in the thread "ThreadMessageHandler (b-msghand): Application level message handling (sending and receiving). Almost all net_processing and validation logic runs on this thread."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, m_last_getheaders_timestamp
is accessed from PeerManagerImpl::MaybeSendGetHeaders()
and PeerManagerImpl::ProcessMessage()
which are only called from a single thread ThreadMessageHandler()
.
because this variable can be accessed in the scheduler thread or in the net processing thread
@sdaftuar, I think it is not accessed from the scheduler thread?
I am in favor of simplicity, in this case meaning to not protect data that is accessed by a single thread. If in the future it is accessed by another thread, then an appropriate protection should be added. That may be an atomic or a mutex just for that variable or a mutex for multiple variables depending on the need. Otherwise we may end up protecting it needlessly now and later to have to rework the protection anyway (or worse - use the available, inappropriate protection, like @MarcoFalke mentioned above "...it may be assumed in the future that this is already perfectly thread safe...").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a bit on this? Unless I am missing something MaybeSendGetHeaders/ConsiderEviction is only called in-thread, not to be confused with CheckForStaleTipAndEvictPeers, which is called out-of-thread. What am I missing?
Ah, I guess I misremembered how ConsiderEviction
is called (and thought it was run in the scheduler thread).
I'll update my other PR and drop the mutex/atomic altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of simplicity, in this case meaning to not protect data that is accessed by a single thread.
Possible idea for the future. In theory could have SingleThread<T>
class similar to the Synced<T>
class from #25390 that wraps an object and asserts in debug mode that is always accessed from the same thread. I think a solution like that would be overkill here, and I'm struggling to think of cases where it wouldn't be overkill, but it wouldn't be hard to implement and could be useful to help code document assumptions it is making.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be resolved by 8efa73e
613e221 test: remove unnecessary parens (Suhas Daftuar) e939cf2 Remove atomic for m_last_getheaders_timestamp (Suhas Daftuar) Pull request description: Eliminate the unnecessary atomic guarding `m_last_getheaders_timestamp`, which is only accessed in a single thread (thanks to MarcoFalke for pointing this out). Also address a nit that came up in #25454. ACKs for top commit: MarcoFalke: review ACK 613e221 vasild: ACK 613e221 Tree-SHA512: 6d6c473735b450b8ad43aae5cf16ed419154d72f4a05c0a6ce6f26caecab9db2361041398b70bf9395611c107d50897f501fa5fdbabb2891144bbc2b479dfdad
// nUnconnectingHeaders gets reset back to 0. | ||
if (!m_chainman.m_blockman.LookupBlockIndex(headers[0].hashPrevBlock) && nCount <= MAX_BLOCKS_TO_ANNOUNCE) { | ||
nodestate->nUnconnectingHeaders++; | ||
m_connman.PushMessage(&pfrom, msgMaker.Make(NetMsgType::GETHEADERS, m_chainman.ActiveChain().GetLocator(m_chainman.m_best_header), uint256())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msgMaker
is now unused in ProcessHeadersMessage. This is fixed in #25717
Same for |
…ker` instances ea7ec78 refactor: Drop no longer used `CNetMsgMaker` instances (Hennadii Stepanov) Pull request description: The removed lines have been unused since the bitcoin/bitcoin@abf5d16 commit from bitcoin/bitcoin#25454. ACKs for top commit: dergoegge: utACK ea7ec78 Sjors: ACK ea7ec78 TheCharlatan: ACK ea7ec78 Tree-SHA512: 9a2a9ff3f124b68a8cd20a637e90885096996c3aa354a4d8adbec98f5761e9e826c1c064ccd90aaf6d72beac61dd9e22c8b76d089e18bba6e0ad51e59a9c7df8
…tances ea7ec78 refactor: Drop no longer used `CNetMsgMaker` instances (Hennadii Stepanov) Pull request description: The removed lines have been unused since the bitcoin@abf5d16 commit from bitcoin#25454. ACKs for top commit: dergoegge: utACK ea7ec78 Sjors: ACK ea7ec78 TheCharlatan: ACK ea7ec78 Tree-SHA512: 9a2a9ff3f124b68a8cd20a637e90885096996c3aa354a4d8adbec98f5761e9e826c1c064ccd90aaf6d72beac61dd9e22c8b76d089e18bba6e0ad51e59a9c7df8
Summary: This introduces a minor change in the misbehavior score: previously a non-connecting non-continuous header sequence of size MAX_BLOCKS_TO_ANNOUNCE+1 was assigned 20 (`non-continuous headers sequence`), now it is assigned 10 (`invalid header received`). The behavior is unchanged if the number of headers is <= MAX_BLOCKS_TO_ANNOUNCE. This is a partial backport of [[bitcoin/bitcoin#25454 | core#25454]] bitcoin/bitcoin@7f24508 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D14906
Summary: This is a partial backport of [[bitcoin/bitcoin#25454 | core#25454]] bitcoin/bitcoin@9492e93 Depends on D14906 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D14907
Summary: Also moves the call to happen directly after validation of a headers message (rather than mixed in with other state updates for the peer), and removes an incorrect comment in favor of one that explains why headers sync must continue from the last header a peer has sent. This is a backport of [[bitcoin/bitcoin#25454 | core#25454]] bitcoin/bitcoin@bf8ea6d Depends on D14907 Note that `FetchMoreHeaders` requires the `cs_main` lock to be held because of the use of `ActiveChain()`. This is needed in the local scope because of the lock splitting in D9591. Test Plan: With clang and debug: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D14908
Summary: And move headers direct fetch to end of ProcessHeadersMessage Review tip: `git show --color-moved=dimmed-zebra --color-moved-ws=ignore-all-space` This should show only linter changes due to reduced indentation in the moved code block. This is a partial backport of [[bitcoin/bitcoin#25454 | core#25454]] bitcoin/bitcoin@29c4518 bitcoin/bitcoin@2b341db Depends on D14908 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D14909
Summary: Review hint: `git show --color-moved=dimmed-zebra --color-moved-ws=ignore-all-space` This is a partial backport of [[bitcoin/bitcoin#25454 | core#25454]] bitcoin/bitcoin@6d95cd3 Depends on D14909 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D14910
Summary: This is a partial backport of [[bitcoin/bitcoin#25454 | core#25454]] bitcoin/bitcoin@ffe87db Depends on D14910 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D14911
Summary: Change getheaders messages so that we wait up to 2 minutes for a response to a prior getheaders message before issuing a new one. Also change the handling of the getheaders message sent in response to a block INV, so that we no longer use the hashstop variable (including the hash stop will just mean that if our peer's headers chain is longer, then we won't learn it, so there's no benefit to using hashstop). Also, now respond to a getheaders during IBD with an empty headers message (rather than nothing) -- this better conforms to the intent of the new logic that it's better to not ignore a peer's getheaders message, even if you have nothing to give. This also avoids a lot of functional tests breaking. This concludes backport of [[bitcoin/bitcoin#25454 | core#25454]] and [[bitcoin/bitcoin#25557 | core#25557]] bitcoin/bitcoin@abf5d16 bitcoin/bitcoin@99f4785 bitcoin/bitcoin@e939cf2 Depends on D14912 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D14913
This misbehavior was originally intended to prevent bandwidth wastage due to very broken (but likely non-malicious) nodes that respond to GETHEADERS with a response unrelated to the request, triggering a request cycle. This has however largely been addressed by bitcoin#25454, which limits the number of (not responded to) GETHEADERS requests in flight to one, with a 2 minute timeout. With that, the specific misbehavior is largely irrelevant (for inbound peers, it is now harmless; for outbound peers, the eviction logic will eventually kick them out if they're not keeping up with headers at all).
Since bitcoin#25454 we keep track of the last GETHEADERS request that was sent and wasn't responded to. So far, every incoming HEADERS message is treated as a response to whatever GETHEADERS was last sent, regardless of its contents. This commit makes this tracking more accurate, by only treating HEADERS messages which (1) are empty, (2) connect to our existing block header tree, or (3) are a continuation of a low-work headers sync as responses that clear the "outstanding GETHEADERS" state (m_last_getheaders_timestamp). That means that HEADERS messages which do not satisfy any of the above criteria will be ignored, not triggering a GETHEADERS, and potentially (for now, but see later commit) increase misbehavior score.
Change
getheaders
messages so that we wait up to 2 minutes for a response to a priorgetheaders
message before issuing a new one.Also change the handling of the
getheaders
message sent in response to a block INV, so that we no longer use the hashstop variable (including the hash stop will just mean that if our peer's headers chain is longer, then we won't learnit, so there's no benefit to using hashstop).
Also, now respond to a
getheaders
during IBD with an empty headers message (rather than nothing) -- this better conforms to the intent of the new logic that it's better to not ignore a peer'sgetheaders
message, even if you have nothing to give. This also avoids a lot of functional tests breaking.This PR also reworks the headers processing logic to make it more readable.