Skip to content

Conversation

sdaftuar
Copy link
Member

When in IBD, if the honest chain is only known by inbound peers, then we must
eventually sync from them in order to learn it. This change allows us to
perform initial headers sync and fetch blocks from inbound peers, if we have no
blocks in flight.

The restriction on having no blocks in flight means that we will naturally
throttle our block downloads to any such inbound peers that we may be
downloading from, until we leave IBD. This is a tradeoff between preferring
outbound peers for most of our block download, versus making sure we always
eventually will get blocks we need that are only known by inbound peers even
during IBD, as otherwise we may be stuck in IBD indefinitely (which could have
cascading failure on the network, if a large fraction of the network managed to
get stuck in IBD).

Note that the test in the second commit fails on master, without the first commit.

@sdaftuar
Copy link
Member Author

See discussion in #21106, eg #21106 (comment)

@DrahtBot DrahtBot added the P2P label Jan 26, 2022
@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 26, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #24970 (net processing: Move cleanSubVer, fPreferredDownload and nLocalHostNonce to Peer by jnewbery)
  • #24008 (assumeutxo: net_processing changes by jamesob)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK. Makes sense to sync from inbounds if the outbounds don't provide any blocks. Seems like this is already the behaviour when there are no outbounds at all.

Copy link
Contributor

@shaavan shaavan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

This PR, along with the previous condition, incorporates a new condition for selecting the peer for downloading block headers. This condition states that if there are no preferred Peers to download and no BlockinFlight, download the blocks from available peer even if that is an inbound peer.
I agree with the logic of this change described in the PR's description. I especially like the commenting of the code, which made it easier to reason with the code's logic.

I shall ACK this PR, as soon as this comment is addressed.

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept and review ACK 4fdd7fe, modulo #24171 (comment). The test fails as expected on master with AssertionError: Block sync timed out.

@sdaftuar sdaftuar force-pushed the 2022-01-download-from-inbound branch from 4fdd7fe to c9fd2e0 Compare January 27, 2022 14:21
@jonatack
Copy link
Member

Perhaps have a look at test/functional/feature_minchainwork.py to see if some of it needs updating with this change or if the test could be added there.

@sdaftuar sdaftuar force-pushed the 2022-01-download-from-inbound branch from c9fd2e0 to 15c1687 Compare January 27, 2022 14:26
@sdaftuar
Copy link
Member Author

sdaftuar commented Jan 27, 2022

@jonatack See also #24178, where I do change the behavior of nMinimumChainWork and update the test for it.

I like the idea of having this very simple test from this PR in our repo, because I have stumbled upon this exact sequence of steps breaking in the past while writing tests, and I'd like that to never happen again! I do agree with you that there is likely room for further improving the feature_minchainwork.py test to incorporate this kind of logic too, but it seems like a bigger rewrite to me.


Just wanted to add an additional meta-review note; I think we could bikeshed the test here or the code style indefinitely, but more important to me is that we are happy with the changed criteria in this PR. While I think my proposal here works overall, I do think there is plenty of room to criticize or defend the behavior I'm proposing, so I'd welcome feedback either way about the specifics of the change and the expected impact on the network.

For instance, one line of thinking is that perhaps we should not prefer outbound peers over inbound peers during IBD at all, now that -maxuploadtarget is a setting that users may use to prevent their node from serving historical blocks if they don't want to be doing that. I opted for a smaller behavior change in this patch, but I could see the argument for making a bigger one (and honestly that would simplify our reasoning about initial sync much more!).

@jonatack
Copy link
Member

ACK 15c1687

@jamesob
Copy link
Contributor

jamesob commented Jan 27, 2022

I've read the code on Github and I agree with the nature of the changes. The burden should always be on data providers to limit externally-requested bandwidth usage per their own thresholds (rather than relying on peers to be "considerate" a priori). On the other hand, unnecessary stalls (and certainly the prospect of network-wide stalls) in IBD are something worth avoiding. So this PR's strategy of allowing requests to inbound peers for blocks when no blocks are otherwise in flight seems reasonable and even preferable given the possibility of stalling otherwise.

I'll review and test locally in a bit.

@ghost
Copy link

ghost commented Feb 10, 2022

Concept ACK

@naumenkogs
Copy link
Member

ACK 15c1687

I think it's a good change even if we opt for something bigger in the future.
nit: We could have a one printed statement here saying this chain is less trusted because it comes from inbound peers only.

Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Downloading blocks from inbound peers when the outbound peers don't deliver sounds good to me.

However mapBlocksInFlight.empty() might not be the best condition to determine if headers should be downloaded from an inbound peer. Just because there are no blocks in flight does not mean that the outbound peers won't deliver headers.

// the latest blocks is from an inbound peer, we have to be sure to
// eventually download it (and not just wait indefinitely for an
// outbound peer to have it).
if (nPreferredDownload == 0 || mapBlocksInFlight.empty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc the logic here is supposed to allow block/header downloads from inbound peers when the outbound peers don't deliver but this seems to ignore the preference for outbound peers during the initial header sync. Should pto be an inbound peer on the first call to SendMessages then we would request initial header sync from pto since there are no blocks in flight at that time. However none of the outbound peers have done anything wrong (e.g. stall header download) that should make us prefer the inbound peer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If our first connection is an inbound peer, then our existing logic is likely to perform initial headers sync from that peer anyway (because we'll sync from an inbound peer if we have no outbounds). So inherently there's some risk if inbound and outbound connections are racing each other at startup that we might pick an inbound peer for initial sync.

On the other hand, if we have multiple peers (inbound and outbound), then once we've begun initial headers sync with one peer (and nSyncStarted gets incremented above 0), then we won't initiate headers sync based on this test anymore -- instead the new test becomes whether our headers chain is close to caught up (see line 4626 below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I overestimated our preference for initial header sync from outbound peers and my concerns were to conservative considering the current syncing logic. My assumption was that we very strongly prefer outbound peers to avoid being fed an alternative chain but it seems that we already accept headers during IBD from any peer (inbound/outbound) whether we requested initial sync from them or not.

Thank you for your detailed response!

Copy link
Member Author

@sdaftuar sdaftuar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dergoegge Thanks for the review and flagging the question about headers sync. I believe the logic I'm proposing here works ok because the test of whether we actually try for headers sync with an inbound peer isn't only based on whether mapBlocksInFlight is empty, but also if nSyncStarted == 0 (our guard to try to sync headers from only one peer at a time). So in order for us to pick an inbound peer for headers sync, it would have to be the case that we somehow have not yet picked an existing outbound peer for headers sync.

This could happen if (1) an outbound peer stalls headers sync and we disconnect them, or if (2) we happen to connect to an inbound peer before any outbound peers are established. I think case (1) is unlikely because if any peer sends us a block INV during the period that we're waiting on headers, we'll initiate headers sync with that peer (we send a getheaders in response to INV messages), and the window for headers sync to complete should be something like 20 minutes.

Case (2) is a somewhat unavoidable issue, because when we start up we don't know if we'll have any outbound peers and so if we don't try to sync from an inbound peer eventually, we may never sync the chain. (And this is how the logic works today already.)

// the latest blocks is from an inbound peer, we have to be sure to
// eventually download it (and not just wait indefinitely for an
// outbound peer to have it).
if (nPreferredDownload == 0 || mapBlocksInFlight.empty()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If our first connection is an inbound peer, then our existing logic is likely to perform initial headers sync from that peer anyway (because we'll sync from an inbound peer if we have no outbounds). So inherently there's some risk if inbound and outbound connections are racing each other at startup that we might pick an inbound peer for initial sync.

On the other hand, if we have multiple peers (inbound and outbound), then once we've begun initial headers sync with one peer (and nSyncStarted gets incremented above 0), then we won't initiate headers sync based on this test anymore -- instead the new test becomes whether our headers chain is close to caught up (see line 4626 below).

When in IBD, if the honest chain is only known by inbound peers, then we must
eventually sync from them in order to learn it. This change allows us to
perform initial headers sync and fetch blocks from inbound peers, if we have no
blocks in flight.

The restriction on having no blocks in flight means that we will naturally
throttle our block downloads to any such inbound peers that we may be
downloading from, until we leave IBD. This is a tradeoff between preferring
outbound peers for most of our block download, versus making sure we always
eventually will get blocks we need that are only known by inbound peers even
during IBD, as otherwise we may be stuck in IBD indefinitely (which could have
cascading failure on the network, if a large fraction of the network managed to
get stuck in IBD).
@ajtowns
Copy link
Contributor

ajtowns commented Jun 1, 2022

ACK 48262a0

Seems like a straightforward improvement, as demonstrated by the test case. Seems to me like there could be better conditions, but better to deploy a simple improvement now than debate forever about the best possible logic and deploy nothing.

@sipa
Copy link
Member

sipa commented Jun 2, 2022

ACK 48262a0

@laanwj laanwj merged commit 00ce854 into bitcoin:master Jun 2, 2022
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jun 3, 2022
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-merge re-ACK 48262a0

Expanded the test to verify explicitly for myself that both of the sites using the sync_blocks_and_headers_from_peer logic are hit in p2p_block_sync.py, by checking that node1 sends both an initial getheaders and a getdata (blocks) message.

diff --git a/test/functional/p2p_block_sync.py b/test/functional/p2p_block_sync.py
index d821edc1b1..b76a02b295 100755
--- a/test/functional/p2p_block_sync.py
+++ b/test/functional/p2p_block_sync.py
@@ -28,8 +28,14 @@ class BlockSyncTest(BitcoinTestFramework):
 
     def run_test(self):
         self.log.info("Setup network: node0->node1->node2")
-        self.log.info("Mining one block on node0 and verify all nodes sync")
-        self.generate(self.nodes[0], 1)
+        self.log.info("Mine one block on node0, verify all nodes sync, and that node1 sends an initial getheaders and a getdata (blocks)")
+        with self.nodes[1].assert_debug_log(
+            expected_msgs=[
+                "[SendMessages] [net] initial getheaders (0) to peer=1 (startheight:0)",
+                "[SendMessages] [net] Requesting block ", " (1) peer=0",
+            ]
+        ):
+            self.generate(self.nodes[0], 1)
         self.log.info("Success!")

@bitcoin bitcoin locked and limited conversation to collaborators Jun 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.