Change download logic to allow calling getdata on inbound peers #5662

sdaftuar · 2015-01-14T20:34:27Z

This fixes the download behavior to allow downloading blocks on a chain that is known only by inbound peers and not by outbound/preferred download peers, mentioned in issue #5138.

@sipa I tried implementing your suggestion (on IRC) of just guarding the getdata code block with a (preferredpeer || rand() %100 == 0). I think this works for what we were discussing (where an inbound peer announces a tip more than 1 block ahead on a chain no one else knows about), but it doesn't resolve @gavinandresen's test case in issue #5138 because we still wouldn't be sending getheaders when an inbound peer connects like we do for outbound peers. Instead, we'd have to wait for an inv that builds on the longer chain from the inbound peer before we send getheaders, and then with this code send the getdata.

One question: is it necessary to also check !pto->fOneShot() with this change? I didn't think it would be possible for us to have a one shot peer's headers and therefore no blocks we'd ever try to download, so I left it out, but perhaps it's preferable to include that check just to be safe...?

As an aside: I noticed that an additional way we could workaround the regtest issue @gavinandresen raised, besides connecting nodes to each other in both directions, would be to whitelist localhost.

I've tested this code on a modified version of the test, where after connecting a node with a longer chain and observing that the tips didn't sync, I then generated one more block on that inbound peer and observed that the tips then did sync.

Finally, I believe that if we wanted to send an initial getheaders message to all inbound peers just as we do for outbound peers so that we could immediately download a longer chain known only by an inbound peer, we could do so with a similar small change to the code that sends the initial getheaders messages (while still preferring outbound peers for downloading blocks during initial block download), but it wasn't clear to me whether such a behavior change would be desirable.

laanwj · 2015-01-29T12:31:58Z

src/main.cpp

@@ -4535,7 +4535,7 @@ bool SendMessages(CNode* pto, bool fSendTrickle)
        // Message: getdata (blocks)
        //
        vector<CInv> vGetData;
-        if (!pto->fDisconnect && !pto->fClient && fFetch && state.nBlocksInFlight < MAX_BLOCKS_IN_TRANSIT_PER_PEER) {
+        if (!pto->fDisconnect && !pto->fClient && (fFetch || GetRandInt(100) == 0) && state.nBlocksInFlight < MAX_BLOCKS_IN_TRANSIT_PER_PEER) {


Please use insecure_rand here. GetRandInt() returns cryptographic randomness and is thus quite expensive to call, and its unecessary here.

Thanks -- fixed (commit squashed and amended).

laanwj · 2015-03-09T12:14:20Z

utACK

gmaxwell · 2015-03-13T21:17:49Z

Seems like a slightly kludgy approach, but better than not, and at least its simple. Have we tested much with it set to also pull from inbound just to make sure there are no issues with that?

WRT localhost, thats probably reasonable but otoh it might hide issues. I think of this restriction on inbound as temporary until we get more controls in place to manage resource usage, after that it can go away completely I think.

sdaftuar · 2015-03-23T17:12:21Z

Yeah that's pretty much how I feel too (kludgy but simple). I did some controlled regtest-testing when I wrote this, but I haven't done any live network tests yet, which I agree would be worth doing so I'll set something up and report back...

sipa · 2015-03-24T12:29:16Z

utACK. It's kludgy, but fine as an interim solution.

It should even be perfectly fine to always enable fetching from any peer, except for the implicitly accepted behavior that not accepting incoming connections will reduce bandwidth usage. When headers-first is more widely deployed, we can get rid of that policy I think, and just add an actual bandwidth throttler.

sdaftuar · 2015-04-01T16:51:39Z

I tried testing initial block download with this code to see what kind of load it might put on an inbound peer (in the steady state, it doesn't seem that this code triggers very often, as expected). It is apparently easier than I realized to run into a situation where a node could start up and acquire inbound peers quickly, before any outbound connections are established. In repeated testing last night, I restarted a long-running node (with -reindex and without the first blockfile, to trigger downloading the whole chain), and my node was selecting an inbound peer to be the peer it was using for initial headers sync, and then with this code that peer was exclusively serving blocks for a while.

Of course that load would have come down after the headers were almost synced and parallel download started, but even at that point you might expect that this code could cause that inbound peer to be serving as much as 1/9 of the blocks to the node (probably less than that due to the insecure_rand() guard but would depend on download speeds from the outbound peers I think).

And more generally, for a node starting up and doing a large download of the blockchain, even if it starts up and is downloading only from outbound peers for a while, if it receives an inv from an inbound peer for a new block, then it would retrieve headers from that inbound peer using the existing inv-processing logic, and this change would then cause that inbound peer to serve earlier blocks as well.

Perhaps this shouldn't be merged as is if we want to avoid using inbound peers for large-ish block download?

sipa · 2015-04-01T17:49:12Z

@sdaftuar Any suggestions for a better strategy?

…peers SendMessages will now call getheaders on both inbound and outbound peers, once the headers chain is close to synced. It will also try downloading blocks from inbound peers once we're out of initial block download (so inbound peers will participate in parallel block fetching for the last day or two of blocks being downloaded).

sdaftuar · 2015-04-02T17:50:32Z

I've pushed an alternate fix here; please review this alternative solution.

What about dropping the insecure_rand in favor of a call to IsInitialBlockDownload():

if (!pto->fDisconnect && !pto->fClient && (fFetch || !IsInitialBlockDownload()) && state.nBlocksInFlight < MAX_BLOCKS_IN_TRANSIT_PER_PEER) {

This would mean that we'd be willing to let inbound peers share the download burden on up to the last ~288 blocks or so (our tip is within 24 hours of our best known header and our best known header is within 24 hours of the current time, in practice this is probably usually closer to 144 blocks). If we do this, I think we should also go ahead and send the initial getheaders messages to inbound peers as well, which I've done in this commit. This now fixes the problem gavin reported here:
#5138 (comment)

Also, I don't think the insecure_rand was giving us much; I think the idea was that in the steady state case, that would cause us to prefer downloading blocks on a reorg chain from outbound peers rather than inbound peers, but that is sufficiently rare and the load sufficiently low that I don't think optimizing that case is very important (especially because we're already willing to download the first new block announced by an inbound peer immediately, so why not the next block or two as well?).

I think this ought to work reasonably well when a far-behind node starts up, because it should have a number of peers by the time it gets close to catching up, meaning that the burden on inbound peers shouldn't be very large.

If a node were to come up an only be a day or two behind, though, then it's possible that a couple inbound peers would be hit with a large fraction of these 288 blocks, so if that is too much potential load, we could lighten it further by reducing MAX_BLOCKS_IN_TRANSIT_PER_PEER for inbound peers -- perhaps just allow 1 or 2 blocks in flight at a time from inbound peers here? I'm not sure how much optimizing is reasonable for this case; the existing logic already would put all the load on inbound peers if we don't have any outbound peers.

sdaftuar · 2015-04-03T19:38:12Z

One behavior worth noting is that with this change, our maximum number of blocks in flight goes up (from 128 to 16 * Number of Network Peers). It's probably not generally going to exceed 288, the most blocks that can be outstanding when inbound peers start being used with the outbound peers while fetching. But if you have a stalling peer that isn't returning a block, then with the existing block download timeout logic (which depends on the number of blocks (with validated headers) in flight at the time of a block request), that stalling peer could have about 24 hours to deliver a block before being disconnected.

Still, there are easy workarounds (eg restarting a sync with listen=0 if you hit a bad peer before finishing a big sync with the network), and this change doesn't make the problem of an unresponsive peer any worse in the steady state, so I'm not sure how much optimization is worthwhile for this case. Thoughts?

sdaftuar · 2015-04-06T19:19:23Z

FYI I just opened #5976 to mitigate any issues with longer block download timeouts that the approach here might otherwise exacerbate.

sipa · 2015-04-07T22:05:29Z

Concept ACK on the new approach. In the steady state, this shouldn't matter much, as almost all blocks there are fetched through the direct-fetching logic, rather than the asynchronous headers-based fetching.

gavinandresen · 2015-04-27T14:28:01Z

utACK.

laanwj · 2015-04-28T08:12:40Z

utACK

00dcaf4 Change download logic to allow calling getheaders/getdata on inbound peers (Suhas Daftuar)

…c_invalidateblock fae961d test: Establish only one connection between nodes in rpc_invalidateblock (MarcoFalke) Pull request description: Headers and block sync should eventually converge to the same result, regardless of whether the peers treat each other as "inbound" or "outbound". `connect_nodes_bi` has been introduced as a (temporary?) workaround for bug #5113 and #5138, which has long been fixed in #5157 and #5662. Thus remove the `connect_nodes_bi` workaround from the rpc_invalidateblock test. Conveniently, this also closes #16453. See #16444 (comment) for rationale ACKs for top commit: laanwj: ACK fae961d Tree-SHA512: b3614c66a205823df73f64d19cacfbec269beb5db52ff79004d746e17d7c0dfb43ab9785fdddc97e2a76fe76286c8c605b34df3dda4a2bf5be035f01169ae89a

…s in rpc_invalidateblock fae961d test: Establish only one connection between nodes in rpc_invalidateblock (MarcoFalke) Pull request description: Headers and block sync should eventually converge to the same result, regardless of whether the peers treat each other as "inbound" or "outbound". `connect_nodes_bi` has been introduced as a (temporary?) workaround for bug bitcoin#5113 and bitcoin#5138, which has long been fixed in bitcoin#5157 and bitcoin#5662. Thus remove the `connect_nodes_bi` workaround from the rpc_invalidateblock test. Conveniently, this also closes bitcoin#16453. See bitcoin#16444 (comment) for rationale ACKs for top commit: laanwj: ACK fae961d Tree-SHA512: b3614c66a205823df73f64d19cacfbec269beb5db52ff79004d746e17d7c0dfb43ab9785fdddc97e2a76fe76286c8c605b34df3dda4a2bf5be035f01169ae89a

fadfd84 test: Remove unused connect_nodes_bi (MarcoFalke) fa3b9ee scripted-diff: test: Replace connect_nodes_bi with connect_nodes (MarcoFalke) faaee1e test: Use connect_nodes when connecting nodes in the test_framework (MarcoFalke) 1111bb9 test: Reformat python imports to aid scripted diff (MarcoFalke) Pull request description: By default all test nodes are connected in a chain. However, instead of just a single connection between each pair of nodes, we end up with up to four connections for a "middle" node (two outbound, two inbound, from each side). This is generally redundant (tx and block relay should succeed with just a single connection) and confusing. For example, test timeouts after a call to `sync_` may be racy and hard to reproduce. On top of that, the test `debug.log`s are hard to read because txs and block invs may be relayed on the same connection multiple times. Fix this by inlining `connect_nodes_bi` in the two tests that need it, and then replace it with a single `connect_nodes` in all other tests. Historic background: `connect_nodes_bi` has been introduced as a (temporary?) workaround for bug #5113 and #5138, which has long been fixed in #5157 and #5662. ACKs for top commit: laanwj: ACK fadfd84 jonasschnelli: utACK fadfd84 - more of less a cleanup PR. promag: Tested ACK fadfd84, ran extended tests. Tree-SHA512: 2d027a8fd150749c071b64438a0a78ec922178628a7dbb89fd1212b0fa34febd451798c940101155d3617c0426c2c4865174147709894f1f1bb6cfa336aa7e24

3e9081c [Tests] Fix tests duration and sorting in test_runner.py (random-zebra) 57dd5e7 [Tests][Refactor] Move connect_nodes_bi to p2p_time_offset (random-zebra) 9b36468 test: Replace connect_nodes_bi with connect_nodes (random-zebra) c2a701e test: Use connect_nodes when connecting nodes in the test_framework (random-zebra) 5d3f2a1 test: Reformat python imports to aid scripted diff (random-zebra) Pull request description: Mostly from upstream bitcoin#16898 > By default all test nodes are connected in a chain. However, instead of just a single connection between each pair of nodes, we end up with up to four connections for a "middle" node (two outbound, two inbound, from each side). > > This is generally redundant (tx and block relay should succeed with just a single connection) and confusing. For example, test timeouts after a call to sync_ may be racy and hard to reproduce. On top of that, the test debug.logs are hard to read because txs and block invs may be relayed on the same connection multiple times. > > Fix this by inlining connect_nodes_bi in the two tests that need it, and then replace it with a single connect_nodes in all other tests. > > Historic background: > > connect_nodes_bi has been introduced as a (temporary?) workaround for bug dashpay#5113 and dashpay#5138, which has long been fixed in dashpay#5157 and dashpay#5662. ACKs for top commit: furszy: Tested ACK 3e9081c . Tree-SHA512: 4af3223c7e5a4b130855d9364b7ec3df5e141151fd1e108c55ef80dd523522780b6130e69cacd467906215c15da9a50496f12298246eb67876fcaa181463cae9

laanwj added the P2P label Jan 15, 2015

laanwj reviewed Jan 29, 2015
View reviewed changes

sdaftuar force-pushed the block-download-fix branch from 4a5ab51 to 9bfa315 Compare January 29, 2015 14:42

sdaftuar force-pushed the block-download-fix branch from 9bfa315 to 00dcaf4 Compare April 2, 2015 17:45

sdaftuar mentioned this pull request Apr 6, 2015

Reduce download timeouts as blocks arrive #5976

Merged

sipa mentioned this pull request Apr 17, 2015

Add a -maxoutbound option for use on pool servers to increase bitcoin network connectivity. #6014

Closed

laanwj merged commit 00dcaf4 into bitcoin:master Apr 28, 2015

laanwj added a commit that referenced this pull request Apr 28, 2015

Merge pull request #5662

5048465

00dcaf4 Change download logic to allow calling getheaders/getdata on inbound peers (Suhas Daftuar)

laanwj mentioned this pull request Jun 1, 2015

Headers-first sync never gets going #5138

Closed

sdaftuar mentioned this pull request Jun 3, 2015

jumpstart headers sync #6228

Closed

maflcko mentioned this pull request Aug 27, 2019

test: Establish only one connection between nodes in rpc_invalidateblock #16737

Merged

maflcko mentioned this pull request Sep 17, 2019

test: Remove connect_nodes_bi #16898

Merged

random-zebra mentioned this pull request Mar 30, 2020

[Tests][Refactor] Remove connect_nodes_bi PIVX-Project/PIVX#1472

Merged

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change download logic to allow calling getdata on inbound peers #5662

Change download logic to allow calling getdata on inbound peers #5662

Uh oh!

sdaftuar commented Jan 14, 2015

Uh oh!

laanwj Jan 29, 2015

Uh oh!

sdaftuar Jan 29, 2015

Uh oh!

laanwj commented Mar 9, 2015

Uh oh!

gmaxwell commented Mar 13, 2015

Uh oh!

sdaftuar commented Mar 23, 2015

Uh oh!

sipa commented Mar 24, 2015

Uh oh!

sdaftuar commented Apr 1, 2015

Uh oh!

sipa commented Apr 1, 2015

Uh oh!

sdaftuar commented Apr 2, 2015

Uh oh!

sdaftuar commented Apr 3, 2015

Uh oh!

sdaftuar commented Apr 6, 2015

Uh oh!

sipa commented Apr 7, 2015

Uh oh!

gavinandresen commented Apr 27, 2015

Uh oh!

laanwj commented Apr 28, 2015

Uh oh!

Uh oh!

Change download logic to allow calling getdata on inbound peers #5662

Change download logic to allow calling getdata on inbound peers #5662

Uh oh!

Conversation

sdaftuar commented Jan 14, 2015

Uh oh!

laanwj Jan 29, 2015

Choose a reason for hiding this comment

Uh oh!

sdaftuar Jan 29, 2015

Choose a reason for hiding this comment

Uh oh!

laanwj commented Mar 9, 2015

Uh oh!

gmaxwell commented Mar 13, 2015

Uh oh!

sdaftuar commented Mar 23, 2015

Uh oh!

sipa commented Mar 24, 2015

Uh oh!

sdaftuar commented Apr 1, 2015

Uh oh!

sipa commented Apr 1, 2015

Uh oh!

sdaftuar commented Apr 2, 2015

Uh oh!

sdaftuar commented Apr 3, 2015

Uh oh!

sdaftuar commented Apr 6, 2015

Uh oh!

sipa commented Apr 7, 2015

Uh oh!

gavinandresen commented Apr 27, 2015

Uh oh!

laanwj commented Apr 28, 2015

Uh oh!

Uh oh!