Skip to content

Conversation

sipa
Copy link
Member

@sipa sipa commented Mar 6, 2024

So far, discouragement of peers triggers when their misbehavior score exceeds 100 points. Most types of misbehavior increment the score by 100, triggering immediate discouragement, but some types do not. This PR makes all increments equal to either 100 (meaning any misbehavior will immediately cause disconnection and discouragement) or 0 (making the behavior effectively unconditionally allowed), and then removes the logic for score accumulation.

This simplifies the code a bit, but also makes protocol expectations clearer: if a peer misbehaves, they get disconnected. There is no good reason why certain types of protocol violations should be permitted 4 times (howmuch=20) or 9 times (howmuch=10), while many others are never allowed. Furthermore, the distinction between these looks arbitrary.

The specific types of misbehavior that are changed to 100 are:

  • Sending us a block which does not connect to our header tree (which necessarily must have been unsollicited). [used to be score 10]
  • Sending us a headers with a non-continuous headers sequence. [used to be score 20]
  • Sending us more than 1000 addresses in a single addr or addrv2 message [used to be score 20]
  • Sending us more than 50000 invs in a single inv message [used to be score 20]
  • Sending us more than 2000 headers in a single headers message [used to be score 20]

The specific types of misbehavior that are changed to 0 are:

  • Sending us 10 (*) separate BIP130 headers announcements that do not connect to our block tree [used to be score 20]
  • Sending us more than 8 headers in a single headers message (which thus does not get treated as a BIP130 announcement) that does not connect to our block tree. [used to be score 10]

I believe that none of these behaviors are unavoidable, except for the one marked (*) which can in theory happen still due to interaction between BIP130 and variations in system clocks (the max 2 hour in the future rule). This one has been removed entirely. In order to remove the impact of the bug it was designed to deal with, without relying on misbehavior, a separate improvement is included that makes getheaders-tracking more accurate.

In another unrelated improvement, this also gets rid of the 8 header limit heuristic to determine whether an incoming non-connecting headers is a potential BIP130 announcement, as this rule is no longer needed to prevent spurious Misbehavior. Instead, any non-connecting headers is now treated as a potential announcement.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 6, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK sr-gi, mzumsande, glozow, achow101
Concept ACK instagibbs, 1440000bytes, brunoerg, sdaftuar, willcl-ark
Stale ACK Sjors

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30277 ([DO NOT MERGE] Erlay: bandwidth-efficient transaction relay protocol (Full implementation) by sr-gi)
  • #30233 (refactor: move m_is_inbound out of CNodeState by sr-gi)
  • #30116 (p2p: Fill reconciliation sets (Erlay) attempt 2 by sr-gi)
  • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)
  • #25832 (tracing: network connection tracepoints by 0xB10C)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@willcl-ark willcl-ark added the P2P label Mar 6, 2024
Copy link
Member

@instagibbs instagibbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concept ACK

going to run on a listening node with logging to test

@1440000bytes

This comment was marked as abuse.

1 similar comment
@brunoerg
Copy link
Contributor

brunoerg commented Mar 6, 2024

Concept ACK

@0xB10C
Copy link
Contributor

0xB10C commented Mar 7, 2024

I have been monitoring peer misbehavior with the misbehaving tracepoint from #25832. In the past weeks, I've seen the following misbehavior across 10 nodes:

  • header with invalid proof of work [100]: very frequent
  • mutated block [100]: these are frequent. I was running p2p: Don't process mutated blocks #29412 where we disconnect for each mutated block (compared to only disconnecting for mutated blocks we hadn't seen before)
  • invalid header received [10]: these are infrequent ("Sending us more than 8 headers in a single headers message")
  • headers message size = X [20]: only seen one occurrence ("Sending us more than 2000 headers in a single headers message that does not connect to our block tree")

I've looked at a few of these invalid header received misbehaviors and they all came from the same IP address with a (fake?) v26.0 Bitcoin Core user agent and always claiming their start height to be 153681 (in the version msg). Seems ok to disconnect these?

I'll keep an eye on further misbehavior.

@Sjors
Copy link
Member

Sjors commented Mar 7, 2024

due to interaction between BIP130 and variations in system clocks (the max 2 hour in the future rule)

Perhaps we can tolerate this if the block is less than 10 hours in the future and disconnect if it's more? It seems cheap enough to process slightly-too-far-in-the-future headers, and they are enormously expensive to produce.

(for other reviewers, look for MAX_NUM_UNCONNECTING_HEADERS_MSGS to see where this is handled)

The peer might also be telling us about a reorg longer than MAX_BLOCKS_TO_ANNOUNCE = 8. In UpdatedBlockTip we:

limit announcements in case of a huge reorganization.
Rely on the peer's synchronization mechanism in that case.

So we should not disconnect the first time around.

The HandleFewUnconnectingHeaders documentation should probably mention this.

Sjors@fad8c5b illustrates how we could limit this to just once (MAX_NUM_UNCONNECTING_HEADERS_MSGS = 2), since > 8 block reorgs should be rare. If we were to immedidately disconnect (MAX_NUM_UNCONNECTING_HEADERS_MSGS = 1 then step 5 of p2p_sendheaders.py fails). This doesn't account for the time difference issue. And in any case it can be done in a followup. And of course we wouldn't need an integer, but could instead call it m_chicken_little.

@Sjors
Copy link
Member

Sjors commented Mar 7, 2024

Sjors@00a7dcc is one approach of dealing with the time difference issue. If any of the headers are > MAX_FUTURE_BLOCK_TIME * 10 in the future we disconnect with time-too-new. We could perhaps increase the number to allow for even more incorrect clocks, but I'm not sure how useful that is.

Some of the comments in that commit may be spurious; I can't think of a useful attack where a peer (e.g. Bob) sends us headers, three hours in the future, with fake PoW. All Bob would achieve is that we ask another peer (Caroll) Bob for the missing headers, which Caroll either won't have, or she does we disconnect her he either sends, for which we disconnect. Or he doesn't send and tries again, at which point we disconnect him.

I also don't know if if it's even worth checking if a header is in the far future, but at least it makes for more useful log entries? (it's nice to know the difference between a peer with a broken clock and a peer that falsely announces a large reorg)

@sipa sipa force-pushed the 202403_nomisbehave branch from bdc5776 to dee6ca2 Compare March 7, 2024 16:38
@sipa
Copy link
Member Author

sipa commented Mar 7, 2024

I discussed this with @sdaftuar, @mzumsande, @sr-gi in person, and decided to instead just drop the Misbehavior score for unconnecting headers. The rationale is that this misbehavior was originally intended to prevent bandwidth wastage due to very broken (but likely non-malicious) nodes actually observed on the network that respond to GETHEADERS with a response unrelated to the request, triggering a request cycle. This has however largely been addressed by #25454, which limits the number of (not responded to) GETHEADERS requests in flight to one, with a 2 minute timeout. With that, the specific misbehavior is largely irrelevant (for inbound peers, it is now mostly harmless; for outbound peers, the eviction logic will eventually kick them out if they're not keeping up with headers at all).

Copy link
Member

@Sjors Sjors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at each of the limits in aee3b67 to confirm they're plenty-old enough to just disconnect anyone that breaks them.

Will look at the newly pushed commits later.

@Sjors
Copy link
Member

Sjors commented Mar 7, 2024

Reviewed dee6ca2

I'll run the log commit for a while.

I'm still a bit on the fence about a903bba.

With that, the specific misbehavior is largely irrelevant (for inbound peers, it is now mostly harmless; ...

Some context here for other reviewers, when you look at SelectNodeToEvict one of the criteria that prevents eviction is if they sent us novel blocks. This means that peers that keep sending us useless headers, will probably not enjoy this protection and eventually get replaced. Unless they also give use useful data, in which case they're ... useful - and we can just humor their headers shenaningans.

... for outbound peers, the eviction logic will eventually kick them out if they're not keeping up with headers at all).

IIUC outbound peer eviction is handled in PeerManagerImpl::ConsiderEviction, which loses its patience after CHAIN_SYNC_TIMEOUT + HEADERS_RESPONSE_TIME. That seems a bit late though, if all our outbound peers are slow? Even if that's rare, it seems better to have as many "fit" outbound peers as possible.

Also... It might still be useful, even if not needed for security, to know if a peer sent us disconnecting headers. Although the NET log still provides a log message received header %s: missing prev block %s, sending getheaders (%d) to end (peer=%d), we don't know the outcome.

Tracking if the peer did this before, e.g. with something like m_chicken_little, would make this easier. But not sure if it's worth the complexity. It seems all this would do is reveal that someone tried an attack that we believe to be useless.

@sipa
Copy link
Member Author

sipa commented Mar 7, 2024

@Sjors It's worth taking into account that despite its history as originally being a "DoS score", the discouragement logic isn't particularly good at preventing intentional abuse of resources; scoring systems are inherently easy to bypass: just do enough to keep the score acceptable, and be as wasteful otherwise (or even better, if many IP addresses are available, use several). The only real solution, in my opinion, for actual DoS protection is keeping track of how many resources (bandwidth, CPU, I/O, ...) we spend on behalf of every peer, and whenever those resources become a bottleneck for us, throttle the worst peers.

So the purpose for the discouragement logic as I see it is avoiding wasting resources on (unintentionally) broken peers. These are somewhat distinct from useless peers (which is what the outbound eviction logic is aimed to prevent). There is an infinite amount of potential broken behavior one can imagine, and clearly we cannot have code for all of it.

Another reason for the misbehavior detection is to help protocol implementers get a good idea for expectations around parts of the protocol that aren't formally specified (because it triggers disconnections, and possibly logging). And arguably, unconnecting headers isn't a protocol violation, at least in the context of BIP130 announcements, so disconnecting for something that's very rare but not impossible to trigger in what we'd consider a perfectly compliant protocol implementation would be confusing.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, a few doc nits and one question below.

Copy link
Member

@sr-gi sr-gi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. I left some comments inline, plus some general ones that may be addressable with the change of functionality:

  • Looks like banmah.h also needs updating, given there is no such thing as how much a peer misbehaves anymore:

    (line 37) // 2. Discouragement. If a peer misbehaves enough (see Misbehaving() in
    // net_processing.cpp)
    
  • p2p_filter.py and p2p_invalid_messages.py are asserting logs looking for Misbehaving. Shouldn't it be better now to just check that the peer was disconnected in those cases? e.g:

    self.log.info('Check that too large filter is rejected')
    with self.nodes[0].assert_debug_log(['Misbehaving']):
        filter_peer.send_and_ping(msg_filterload(data=b'\xbb'*(MAX_BLOOM_FILTER_SIZE+1)))

@@ -1828,8 +1828,7 @@ bool PeerManagerImpl::MaybePunishNodeForBlock(NodeId nodeid, const BlockValidati
return true;
// Conflicting (but not necessarily invalid) data or different policy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment accurate? I don't see how it applies

Copy link
Member Author

@sipa sipa Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, this seems to be a leftover comment that hasn't applied in years. This code used to handle both block and transaction validation failures, but transactions were moved elsewhere. It seems unrelated to this PR, so I won't touch it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sr-gi Do you want to follow-up on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at this to see if it would make sense to remove the comment, but looks like this is still consistent with the original approach, even after splitting the logic in two for txs and blocks:

ef54b48

To me, it seems that, at the very least, this comment would need to be moved one case down, given BlockValidationResult::BLOCK_MISSING_PREV triggers misbehavior, so the comment doesn't apply there.

@@ -1848,7 +1835,7 @@ bool PeerManagerImpl::MaybePunishNodeForTx(NodeId nodeid, const TxValidationStat
break;
// The node is providing invalid data:
case TxValidationResult::TX_CONSENSUS:
if (peer) Misbehaving(*peer, 100, "");
if (peer) Misbehaving(*peer, "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we don't even signal this as a consensus discrepancy? Is it logged somewhere else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the network processing layer, which acts in response to validation failures (to the extent that it needs to punish peers for it). The actual validation failure should be logged through the validation code itself, see ConnectTip() there, which logs failures, and invokes the BlockChecked signal that is handled by net_processing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. The message passed to Misbehaving ranges in how explicit it is though (from the actual reason to the category, without really specifying what, e.g. "invalid compact block"). That's why I was expecting this to at least signal the reason is due to a consensus discrepancy

@sipa sipa force-pushed the 202403_nomisbehave branch from dee6ca2 to e7ccaa0 Compare March 12, 2024 19:09
@sipa
Copy link
Member Author

sipa commented Mar 12, 2024

Addressed most of the feedback I've gotten.

As pointed out by @mzumsande in #29575 (comment), the protection I imagined existed against buggy clients causing a getheaders/headers cycle didn't actually exist. I've instead added that as a separate improvement, becoming the new first commit.

@sr-gi I have not addressed the p2p_filter.py and p2p_invalid_messages.py using misbehavior logging. These tests use whitelisting to prevent disconnection, and I think reworking that isn't necessarily an improvement.

@Sjors
Copy link
Member

Sjors commented Mar 22, 2024

I ran a slightly outdated version of this PR with the log commit e451338 for the last two weeks on a decently connected node. Only a single MISBEHAVE log entry:

168782-2024-03-07T15:58:10.608477Z Saw new header hash=00000000000000000002c88981de747ae237d80b74a5deb1a17943a76b296e3a height=833594 peer=1501 peeraddr=x.x.x.x:35630
168948-2024-03-07T15:58:10.947732Z UpdateTip: new best=00000000000000000002c88981de747ae237d80b74a5deb1a17943a76b296e3a height=833594 version=0x2319e000 log2_work=94.778914 tx=973821560 date='2024-03-07T15:57:26Z' progress=1.000000 cache=16.8MiB(112927txo)
169198:2024-03-07T15:58:19.055670Z MISBEHAVE: invalid header

This invalid header was sent 9 seconds after receiving a new valid block.

I'll continue to run with the latest version...


ACK e7ccaa0

@DrahtBot DrahtBot requested review from instagibbs and sr-gi March 22, 2024 12:50
sipa added 4 commits May 30, 2024 08:34
This misbehavior was originally intended to prevent bandwidth wastage due to
actually observed very broken (but likely non-malicious) nodes that respond
to GETHEADERS with a response unrelated to the request, triggering a request
cycle.

This has however largely been addressed by the previous commit, which causes
non-connecting HEADERS that are received while a GETHEADERS has not been
responded to, to be ignored, as long as they do not time out (2 minutes).
With that, the specific misbehavior is largely irrelevant (for inbound peers,
it is now harmless; for outbound peers, the eviction logic will eventually
kick them out if they're not keeping up with headers at all).
With the Misbehavior score gone for non-connecting headers (see previous
commit), there is no need to only treat headers messages with up to 8
headers as potential BIP130 announcements. BIP130 does not specify such
a limit; it was purely a heuristic.
This removes the need to actually track misbehavior score (see further commit), because any
Misbehaving node will immediately hit the discouragement threshold.
@sipa sipa force-pushed the 202403_nomisbehave branch from e7ccaa0 to ae60d48 Compare May 30, 2024 12:39
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/25603357473

@sr-gi
Copy link
Member

sr-gi commented Jun 6, 2024

ACK ae60d48

Something orthogonal to this, but that I realized working on the same part of the codebase and I wonder if it could be added here:

Looks like the result of MaybePunishNodeForBlock is never consumed in the codebase, plus the docs regarding what to return seem not to follow what the code is doing (if peer is not found, Misbehaving is not called, but the method returns true).

Should we just refactor the method to simply return void?

@sipa
Copy link
Member Author

sipa commented Jun 6, 2024

@sr-gi Done, in a new commit, for both MaybePunishNodeForBlock and MaybePunishNodeForTx.

@sr-gi
Copy link
Member

sr-gi commented Jun 6, 2024

ACK 6eecba4

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK 6eecba4

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

light code review / concept ACK 6eecba4

@achow101
Copy link
Member

ACK 6eecba4

@achow101 achow101 merged commit a52837b into bitcoin:master Jun 20, 2024
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Apr 29, 2025
…instead of using the misbehavior score

Summary:
Use a dedicated `CNode::m_avalanche_message_fault_score` member variable to track repeated avaresponse message faults, and call `Misbehaving(*peer, 100)` when that score reaches a threshold instead of incrementing the misbehavior score by 2 directly.

In a case in which the only source of misbehavior is the avaresponse faults, the behavior is unchanged. If the misbehavior score is incremented  for other reasons, now we are a bit more tolerant of network faults.

This prepares the codebase for a following diff ([[bitcoin/bitcoin#29575 | core#29575]]) in which we will stop tracking the misbehavior score, and calling `Misbehaving` will immediately trigger discouragement.

Depends on D18021

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D18019
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request May 1, 2025
Summary:
Since bitcoin/bitcoin#25454 we keep track of the last
GETHEADERS request that was sent and wasn't responded to. So far, every incoming
HEADERS message is treated as a response to whatever GETHEADERS was last sent,
regardless of its contents.

This commit makes this tracking more accurate, by only treating HEADERS messages
which (1) are empty, (2) connect to our existing block header tree, or (3) are a
continuation of a low-work headers sync as responses that clear the "outstanding
GETHEADERS" state (m_last_getheaders_timestamp).

That means that HEADERS messages which do not satisfy any of the above criteria
will be ignored, not triggering a GETHEADERS, and potentially (for now, but see
later commit) increase misbehavior score.

This is a partial backport of [[bitcoin/bitcoin#29575 | core#29575]]
bitcoin/bitcoin@9f66ac7

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D18029
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request May 1, 2025
Summary:
This misbehavior was originally intended to prevent bandwidth wastage due to
actually observed very broken (but likely non-malicious) nodes that respond
to GETHEADERS with a response unrelated to the request, triggering a request
cycle.

This has however largely been addressed by the previous commit, which causes
non-connecting HEADERS that are received while a GETHEADERS has not been
responded to, to be ignored, as long as they do not time out (2 minutes).
With that, the specific misbehavior is largely irrelevant (for inbound peers,
it is now harmless; for outbound peers, the eviction logic will eventually
kick them out if they're not keeping up with headers at all).

This is a partial backport of [[bitcoin/bitcoin#29575 | core#29575]]
bitcoin/bitcoin@944c542
Depends on D18029

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D18030
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request May 1, 2025
Summary:
With the Misbehavior score gone for non-connecting headers (see previous
commit), there is no need to only treat headers messages with up to 8
headers as potential BIP130 announcements. BIP130 does not specify such
a limit; it was purely a heuristic.

This is a partial backport of [[bitcoin/bitcoin#29575 | core#29575]]
bitcoin/bitcoin@5120ab1

Depends on D18030

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D18031
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request May 1, 2025
Summary:
This removes the need to actually track misbehavior score (see further commit), because any
Misbehaving node will immediately hit the discouragement threshold.

This is a partial backport of [[bitcoin/bitcoin#29575 | core#29575]]
bitcoin/bitcoin@6457c31

Depends on D18031

Test Plan:
`ninja all check-all-extended`

run an ibd with  -assumevalid=0 -checkpoints=0`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D18032
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request May 1, 2025
Summary:
This is now all unused.
Make MaybePunishNodeFor{Block,Tx} return void

This concludes backport of [[bitcoin/bitcoin#29575 | core#29575]]
bitcoin/bitcoin@ae60d48
bitcoin/bitcoin@6eecba4

Depends on D18032

Test Plan:
`ninja all check-all-extended`

run an ibd with `-assumevalid=0 -checkpoints=0`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D18034
@bitcoin bitcoin locked and limited conversation to collaborators Jun 28, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.