-
Notifications
You must be signed in to change notification settings - Fork 37.7k
net_processing: make any misbehavior trigger immediate discouragement #29575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concept ACK
going to run on a listening node with logging to test
This comment was marked as abuse.
This comment was marked as abuse.
1 similar comment
Concept ACK |
I have been monitoring peer misbehavior with the misbehaving tracepoint from #25832. In the past weeks, I've seen the following misbehavior across 10 nodes:
I've looked at a few of these I'll keep an eye on further misbehavior. |
Perhaps we can tolerate this if the block is less than 10 hours in the future and disconnect if it's more? It seems cheap enough to process slightly-too-far-in-the-future headers, and they are enormously expensive to produce. (for other reviewers, look for The peer might also be telling us about a reorg longer than
So we should not disconnect the first time around. The Sjors@fad8c5b illustrates how we could limit this to just once ( |
Sjors@00a7dcc is one approach of dealing with the time difference issue. If any of the headers are Some of the comments in that commit may be spurious; I can't think of a useful attack where a peer (e.g. Bob) sends us headers, three hours in the future, with fake PoW. All Bob would achieve is that we ask I also don't know if if it's even worth checking if a header is in the far future, but at least it makes for more useful log entries? (it's nice to know the difference between a peer with a broken clock and a peer that falsely announces a large reorg) |
I discussed this with @sdaftuar, @mzumsande, @sr-gi in person, and decided to instead just drop the Misbehavior score for unconnecting headers. The rationale is that this misbehavior was originally intended to prevent bandwidth wastage due to very broken (but likely non-malicious) nodes actually observed on the network that respond to GETHEADERS with a response unrelated to the request, triggering a request cycle. This has however largely been addressed by #25454, which limits the number of (not responded to) GETHEADERS requests in flight to one, with a 2 minute timeout. With that, the specific misbehavior is largely irrelevant (for inbound peers, it is now mostly harmless; for outbound peers, the eviction logic will eventually kick them out if they're not keeping up with headers at all). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at each of the limits in aee3b67 to confirm they're plenty-old enough to just disconnect anyone that breaks them.
Will look at the newly pushed commits later.
Reviewed dee6ca2 I'll run the log commit for a while. I'm still a bit on the fence about a903bba.
Some context here for other reviewers, when you look at
IIUC outbound peer eviction is handled in Also... It might still be useful, even if not needed for security, to know if a peer sent us disconnecting headers. Although the Tracking if the peer did this before, e.g. with something like |
@Sjors It's worth taking into account that despite its history as originally being a "DoS score", the discouragement logic isn't particularly good at preventing intentional abuse of resources; scoring systems are inherently easy to bypass: just do enough to keep the score acceptable, and be as wasteful otherwise (or even better, if many IP addresses are available, use several). The only real solution, in my opinion, for actual DoS protection is keeping track of how many resources (bandwidth, CPU, I/O, ...) we spend on behalf of every peer, and whenever those resources become a bottleneck for us, throttle the worst peers. So the purpose for the discouragement logic as I see it is avoiding wasting resources on (unintentionally) broken peers. These are somewhat distinct from useless peers (which is what the outbound eviction logic is aimed to prevent). There is an infinite amount of potential broken behavior one can imagine, and clearly we cannot have code for all of it. Another reason for the misbehavior detection is to help protocol implementers get a good idea for expectations around parts of the protocol that aren't formally specified (because it triggers disconnections, and possibly logging). And arguably, unconnecting headers isn't a protocol violation, at least in the context of BIP130 announcements, so disconnecting for something that's very rare but not impossible to trigger in what we'd consider a perfectly compliant protocol implementation would be confusing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, a few doc nits and one question below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. I left some comments inline, plus some general ones that may be addressable with the change of functionality:
-
Looks like
banmah.h
also needs updating, given there is no such thing as how much a peer misbehaves anymore:(line 37) // 2. Discouragement. If a peer misbehaves enough (see Misbehaving() in // net_processing.cpp)
-
p2p_filter.py
andp2p_invalid_messages.py
are asserting logs looking forMisbehaving
. Shouldn't it be better now to just check that the peer was disconnected in those cases? e.g:self.log.info('Check that too large filter is rejected') with self.nodes[0].assert_debug_log(['Misbehaving']): filter_peer.send_and_ping(msg_filterload(data=b'\xbb'*(MAX_BLOOM_FILTER_SIZE+1)))
@@ -1828,8 +1828,7 @@ bool PeerManagerImpl::MaybePunishNodeForBlock(NodeId nodeid, const BlockValidati | |||
return true; | |||
// Conflicting (but not necessarily invalid) data or different policy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment accurate? I don't see how it applies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, this seems to be a leftover comment that hasn't applied in years. This code used to handle both block and transaction validation failures, but transactions were moved elsewhere. It seems unrelated to this PR, so I won't touch it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sr-gi Do you want to follow-up on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at this to see if it would make sense to remove the comment, but looks like this is still consistent with the original approach, even after splitting the logic in two for txs and blocks:
To me, it seems that, at the very least, this comment would need to be moved one case down, given BlockValidationResult::BLOCK_MISSING_PREV
triggers misbehavior, so the comment doesn't apply there.
@@ -1848,7 +1835,7 @@ bool PeerManagerImpl::MaybePunishNodeForTx(NodeId nodeid, const TxValidationStat | |||
break; | |||
// The node is providing invalid data: | |||
case TxValidationResult::TX_CONSENSUS: | |||
if (peer) Misbehaving(*peer, 100, ""); | |||
if (peer) Misbehaving(*peer, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we don't even signal this as a consensus discrepancy? Is it logged somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the network processing layer, which acts in response to validation failures (to the extent that it needs to punish peers for it). The actual validation failure should be logged through the validation code itself, see ConnectTip()
there, which logs failures, and invokes the BlockChecked
signal that is handled by net_processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The message passed to Misbehaving
ranges in how explicit it is though (from the actual reason to the category, without really specifying what, e.g. "invalid compact block"). That's why I was expecting this to at least signal the reason is due to a consensus discrepancy
dee6ca2
to
e7ccaa0
Compare
Addressed most of the feedback I've gotten. As pointed out by @mzumsande in #29575 (comment), the protection I imagined existed against buggy clients causing a @sr-gi I have not addressed the |
I ran a slightly outdated version of this PR with the log commit e451338 for the last two weeks on a decently connected node. Only a single
This invalid header was sent 9 seconds after receiving a new valid block. I'll continue to run with the latest version... ACK e7ccaa0 |
This misbehavior was originally intended to prevent bandwidth wastage due to actually observed very broken (but likely non-malicious) nodes that respond to GETHEADERS with a response unrelated to the request, triggering a request cycle. This has however largely been addressed by the previous commit, which causes non-connecting HEADERS that are received while a GETHEADERS has not been responded to, to be ignored, as long as they do not time out (2 minutes). With that, the specific misbehavior is largely irrelevant (for inbound peers, it is now harmless; for outbound peers, the eviction logic will eventually kick them out if they're not keeping up with headers at all).
With the Misbehavior score gone for non-connecting headers (see previous commit), there is no need to only treat headers messages with up to 8 headers as potential BIP130 announcements. BIP130 does not specify such a limit; it was purely a heuristic.
This removes the need to actually track misbehavior score (see further commit), because any Misbehaving node will immediately hit the discouragement threshold.
This is now all unused.
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
ACK ae60d48 Something orthogonal to this, but that I realized working on the same part of the codebase and I wonder if it could be added here: Looks like the result of Should we just refactor the method to simply return |
@sr-gi Done, in a new commit, for both |
ACK 6eecba4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK 6eecba4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
light code review / concept ACK 6eecba4
ACK 6eecba4 |
…instead of using the misbehavior score Summary: Use a dedicated `CNode::m_avalanche_message_fault_score` member variable to track repeated avaresponse message faults, and call `Misbehaving(*peer, 100)` when that score reaches a threshold instead of incrementing the misbehavior score by 2 directly. In a case in which the only source of misbehavior is the avaresponse faults, the behavior is unchanged. If the misbehavior score is incremented for other reasons, now we are a bit more tolerant of network faults. This prepares the codebase for a following diff ([[bitcoin/bitcoin#29575 | core#29575]]) in which we will stop tracking the misbehavior score, and calling `Misbehaving` will immediately trigger discouragement. Depends on D18021 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Subscribers: Fabien Differential Revision: https://reviews.bitcoinabc.org/D18019
Summary: Since bitcoin/bitcoin#25454 we keep track of the last GETHEADERS request that was sent and wasn't responded to. So far, every incoming HEADERS message is treated as a response to whatever GETHEADERS was last sent, regardless of its contents. This commit makes this tracking more accurate, by only treating HEADERS messages which (1) are empty, (2) connect to our existing block header tree, or (3) are a continuation of a low-work headers sync as responses that clear the "outstanding GETHEADERS" state (m_last_getheaders_timestamp). That means that HEADERS messages which do not satisfy any of the above criteria will be ignored, not triggering a GETHEADERS, and potentially (for now, but see later commit) increase misbehavior score. This is a partial backport of [[bitcoin/bitcoin#29575 | core#29575]] bitcoin/bitcoin@9f66ac7 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D18029
Summary: This misbehavior was originally intended to prevent bandwidth wastage due to actually observed very broken (but likely non-malicious) nodes that respond to GETHEADERS with a response unrelated to the request, triggering a request cycle. This has however largely been addressed by the previous commit, which causes non-connecting HEADERS that are received while a GETHEADERS has not been responded to, to be ignored, as long as they do not time out (2 minutes). With that, the specific misbehavior is largely irrelevant (for inbound peers, it is now harmless; for outbound peers, the eviction logic will eventually kick them out if they're not keeping up with headers at all). This is a partial backport of [[bitcoin/bitcoin#29575 | core#29575]] bitcoin/bitcoin@944c542 Depends on D18029 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D18030
Summary: With the Misbehavior score gone for non-connecting headers (see previous commit), there is no need to only treat headers messages with up to 8 headers as potential BIP130 announcements. BIP130 does not specify such a limit; it was purely a heuristic. This is a partial backport of [[bitcoin/bitcoin#29575 | core#29575]] bitcoin/bitcoin@5120ab1 Depends on D18030 Test Plan: `ninja all check-all` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D18031
Summary: This removes the need to actually track misbehavior score (see further commit), because any Misbehaving node will immediately hit the discouragement threshold. This is a partial backport of [[bitcoin/bitcoin#29575 | core#29575]] bitcoin/bitcoin@6457c31 Depends on D18031 Test Plan: `ninja all check-all-extended` run an ibd with -assumevalid=0 -checkpoints=0` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Subscribers: Fabien Differential Revision: https://reviews.bitcoinabc.org/D18032
Summary: This is now all unused. Make MaybePunishNodeFor{Block,Tx} return void This concludes backport of [[bitcoin/bitcoin#29575 | core#29575]] bitcoin/bitcoin@ae60d48 bitcoin/bitcoin@6eecba4 Depends on D18032 Test Plan: `ninja all check-all-extended` run an ibd with `-assumevalid=0 -checkpoints=0` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D18034
So far, discouragement of peers triggers when their misbehavior score exceeds 100 points. Most types of misbehavior increment the score by 100, triggering immediate discouragement, but some types do not. This PR makes all increments equal to either 100 (meaning any misbehavior will immediately cause disconnection and discouragement) or 0 (making the behavior effectively unconditionally allowed), and then removes the logic for score accumulation.
This simplifies the code a bit, but also makes protocol expectations clearer: if a peer misbehaves, they get disconnected. There is no good reason why certain types of protocol violations should be permitted 4 times (howmuch=20) or 9 times (howmuch=10), while many others are never allowed. Furthermore, the distinction between these looks arbitrary.
The specific types of misbehavior that are changed to 100 are:
block
which does not connect to our header tree (which necessarily must have been unsollicited). [used to be score 10]headers
with a non-continuous headers sequence. [used to be score 20]addr
oraddrv2
message [used to be score 20]inv
message [used to be score 20]headers
message [used to be score 20]The specific types of misbehavior that are changed to 0 are:
headers
message (which thus does not get treated as a BIP130 announcement) that does not connect to our block tree. [used to be score 10]I believe that none of these behaviors are unavoidable, except for the one marked (*) which can in theory happen still due to interaction between BIP130 and variations in system clocks (the max 2 hour in the future rule). This one has been removed entirely. In order to remove the impact of the bug it was designed to deal with, without relying on misbehavior, a separate improvement is included that makes
getheaders
-tracking more accurate.In another unrelated improvement, this also gets rid of the 8 header limit heuristic to determine whether an incoming non-connecting
headers
is a potential BIP130 announcement, as this rule is no longer needed to prevent spurious Misbehavior. Instead, any non-connectingheaders
is now treated as a potential announcement.