-
Notifications
You must be signed in to change notification settings - Fork 37.7k
p2p: Sync chain more readily from inbound peers during IBD #24171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See discussion in #21106, eg #21106 (comment) |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. Makes sense to sync from inbounds if the outbounds don't provide any blocks. Seems like this is already the behaviour when there are no outbounds at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
This PR, along with the previous condition, incorporates a new condition for selecting the peer for downloading block headers. This condition states that if there are no preferred Peers to download and no BlockinFlight, download the blocks from available peer even if that is an inbound peer.
I agree with the logic of this change described in the PR's description. I especially like the commenting of the code, which made it easier to reason with the code's logic.
I shall ACK this PR, as soon as this comment is addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept and review ACK 4fdd7fe, modulo #24171 (comment). The test fails as expected on master with AssertionError: Block sync timed out
.
4fdd7fe
to
c9fd2e0
Compare
Perhaps have a look at |
c9fd2e0
to
15c1687
Compare
@jonatack See also #24178, where I do change the behavior of I like the idea of having this very simple test from this PR in our repo, because I have stumbled upon this exact sequence of steps breaking in the past while writing tests, and I'd like that to never happen again! I do agree with you that there is likely room for further improving the Just wanted to add an additional meta-review note; I think we could bikeshed the test here or the code style indefinitely, but more important to me is that we are happy with the changed criteria in this PR. While I think my proposal here works overall, I do think there is plenty of room to criticize or defend the behavior I'm proposing, so I'd welcome feedback either way about the specifics of the change and the expected impact on the network. For instance, one line of thinking is that perhaps we should not prefer outbound peers over inbound peers during IBD at all, now that |
ACK 15c1687 |
I've read the code on Github and I agree with the nature of the changes. The burden should always be on data providers to limit externally-requested bandwidth usage per their own thresholds (rather than relying on peers to be "considerate" a priori). On the other hand, unnecessary stalls (and certainly the prospect of network-wide stalls) in IBD are something worth avoiding. So this PR's strategy of allowing requests to inbound peers for blocks when no blocks are otherwise in flight seems reasonable and even preferable given the possibility of stalling otherwise. I'll review and test locally in a bit. |
Concept ACK |
ACK 15c1687 I think it's a good change even if we opt for something bigger in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Downloading blocks from inbound peers when the outbound peers don't deliver sounds good to me.
However mapBlocksInFlight.empty()
might not be the best condition to determine if headers should be downloaded from an inbound peer. Just because there are no blocks in flight does not mean that the outbound peers won't deliver headers.
src/net_processing.cpp
Outdated
// the latest blocks is from an inbound peer, we have to be sure to | ||
// eventually download it (and not just wait indefinitely for an | ||
// outbound peer to have it). | ||
if (nPreferredDownload == 0 || mapBlocksInFlight.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc the logic here is supposed to allow block/header downloads from inbound peers when the outbound peers don't deliver but this seems to ignore the preference for outbound peers during the initial header sync. Should pto
be an inbound peer on the first call to SendMessages
then we would request initial header sync from pto
since there are no blocks in flight at that time. However none of the outbound peers have done anything wrong (e.g. stall header download) that should make us prefer the inbound peer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If our first connection is an inbound peer, then our existing logic is likely to perform initial headers sync from that peer anyway (because we'll sync from an inbound peer if we have no outbounds). So inherently there's some risk if inbound and outbound connections are racing each other at startup that we might pick an inbound peer for initial sync.
On the other hand, if we have multiple peers (inbound and outbound), then once we've begun initial headers sync with one peer (and nSyncStarted
gets incremented above 0), then we won't initiate headers sync based on this test anymore -- instead the new test becomes whether our headers chain is close to caught up (see line 4626 below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I overestimated our preference for initial header sync from outbound peers and my concerns were to conservative considering the current syncing logic. My assumption was that we very strongly prefer outbound peers to avoid being fed an alternative chain but it seems that we already accept headers during IBD from any peer (inbound/outbound) whether we requested initial sync from them or not.
Thank you for your detailed response!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dergoegge Thanks for the review and flagging the question about headers sync. I believe the logic I'm proposing here works ok because the test of whether we actually try for headers sync with an inbound peer isn't only based on whether mapBlocksInFlight
is empty, but also if nSyncStarted == 0
(our guard to try to sync headers from only one peer at a time). So in order for us to pick an inbound peer for headers sync, it would have to be the case that we somehow have not yet picked an existing outbound peer for headers sync.
This could happen if (1) an outbound peer stalls headers sync and we disconnect them, or if (2) we happen to connect to an inbound peer before any outbound peers are established. I think case (1) is unlikely because if any peer sends us a block INV during the period that we're waiting on headers, we'll initiate headers sync with that peer (we send a getheaders
in response to INV messages), and the window for headers sync to complete should be something like 20 minutes.
Case (2) is a somewhat unavoidable issue, because when we start up we don't know if we'll have any outbound peers and so if we don't try to sync from an inbound peer eventually, we may never sync the chain. (And this is how the logic works today already.)
src/net_processing.cpp
Outdated
// the latest blocks is from an inbound peer, we have to be sure to | ||
// eventually download it (and not just wait indefinitely for an | ||
// outbound peer to have it). | ||
if (nPreferredDownload == 0 || mapBlocksInFlight.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If our first connection is an inbound peer, then our existing logic is likely to perform initial headers sync from that peer anyway (because we'll sync from an inbound peer if we have no outbounds). So inherently there's some risk if inbound and outbound connections are racing each other at startup that we might pick an inbound peer for initial sync.
On the other hand, if we have multiple peers (inbound and outbound), then once we've begun initial headers sync with one peer (and nSyncStarted
gets incremented above 0), then we won't initiate headers sync based on this test anymore -- instead the new test becomes whether our headers chain is close to caught up (see line 4626 below).
When in IBD, if the honest chain is only known by inbound peers, then we must eventually sync from them in order to learn it. This change allows us to perform initial headers sync and fetch blocks from inbound peers, if we have no blocks in flight. The restriction on having no blocks in flight means that we will naturally throttle our block downloads to any such inbound peers that we may be downloading from, until we leave IBD. This is a tradeoff between preferring outbound peers for most of our block download, versus making sure we always eventually will get blocks we need that are only known by inbound peers even during IBD, as otherwise we may be stuck in IBD indefinitely (which could have cascading failure on the network, if a large fraction of the network managed to get stuck in IBD).
15c1687
to
48262a0
Compare
ACK 48262a0 Seems like a straightforward improvement, as demonstrated by the test case. Seems to me like there could be better conditions, but better to deploy a simple improvement now than debate forever about the best possible logic and deploy nothing. |
ACK 48262a0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-merge re-ACK 48262a0
Expanded the test to verify explicitly for myself that both of the sites using the sync_blocks_and_headers_from_peer
logic are hit in p2p_block_sync.py
, by checking that node1
sends both an initial getheaders
and a getdata (blocks)
message.
diff --git a/test/functional/p2p_block_sync.py b/test/functional/p2p_block_sync.py
index d821edc1b1..b76a02b295 100755
--- a/test/functional/p2p_block_sync.py
+++ b/test/functional/p2p_block_sync.py
@@ -28,8 +28,14 @@ class BlockSyncTest(BitcoinTestFramework):
def run_test(self):
self.log.info("Setup network: node0->node1->node2")
- self.log.info("Mining one block on node0 and verify all nodes sync")
- self.generate(self.nodes[0], 1)
+ self.log.info("Mine one block on node0, verify all nodes sync, and that node1 sends an initial getheaders and a getdata (blocks)")
+ with self.nodes[1].assert_debug_log(
+ expected_msgs=[
+ "[SendMessages] [net] initial getheaders (0) to peer=1 (startheight:0)",
+ "[SendMessages] [net] Requesting block ", " (1) peer=0",
+ ]
+ ):
+ self.generate(self.nodes[0], 1)
self.log.info("Success!")
When in IBD, if the honest chain is only known by inbound peers, then we must
eventually sync from them in order to learn it. This change allows us to
perform initial headers sync and fetch blocks from inbound peers, if we have no
blocks in flight.
The restriction on having no blocks in flight means that we will naturally
throttle our block downloads to any such inbound peers that we may be
downloading from, until we leave IBD. This is a tradeoff between preferring
outbound peers for most of our block download, versus making sure we always
eventually will get blocks we need that are only known by inbound peers even
during IBD, as otherwise we may be stuck in IBD indefinitely (which could have
cascading failure on the network, if a large fraction of the network managed to
get stuck in IBD).
Note that the test in the second commit fails on master, without the first commit.