p2p: Reduce bandwidth during initial headers sync when a block is found #25720

sdaftuar · 2022-07-27T15:52:17Z

On startup, if our headers chain is more than a day behind current time, we'll pick one peer to sync headers with until our best headers chain is caught up (at that point, we'll try to sync headers with all peers).

However, if an INV for a block is received before our headers chain is caught up, we'll then start to sync headers from each peer announcing the block. This can result in doing a big headers sync with many (if not all) of our peers simultaneously, which wastes bandwidth.

This PR would reduce that overhead by picking (at most) one new peer to try syncing headers with whenever a new block is announced, prior to our headers chain being caught up.

Post-merge edit: See #25720 (comment) for more context about this change.

test/functional/p2p_initial_headers_sync.py

maflcko

left a few questions

src/net_processing.cpp

maflcko · 2022-08-02T15:17:01Z

src/net_processing.cpp

+            // as new blocks are found, we are willing to add one new peer per
+            // block to sync with as well, to sync quicker in the case where
+            // our initial peer is unresponsive (but less bandwidth than we'd
+            // use if we turned on sync with all peers).


If we assume that the initial headers sync logic is robust, why add the additional complexity here? Wouldn't it be better to just skip the getheaders message? Or if the logic isn't assumed to be robust, lower the timeout or add a new peer on a random timeout?

I'm not sure I completely understand what you're asking, but the topic is more complicated than just an assumption around whether the initial headers sync logic is robust:

Ideally, we would only download the full headers chain from a single peer when we are starting up, because it saves bandwidth to do so.

It's possible that the peer we pick for initial headers sync could be (a) slow, (b) on a chain that is not the main chain, (c) adversarial, or some other terrible combination of those factors. So we cannot just have our logic rely on the initial peer to serve us the honest chain in a reasonable amount of time.

We currently have two behaviors that help protect us from choosing a bad initial peer. The main protection we have is that when a block INV is received, we send a getheaders to all peers that announce the block, resulting in us getting the main chain with high probability. However, this is bandwidth wasting if we have many peers that serve us an INV at the same time, which is probably the common case when we're in a scenario that our initial peer is slow.

The second protection we have is that after about 20 minutes, we'll evict our initial headers-sync peer if our tip's timestamp isn't within a day of the current time. This could kick in if we have a bad initial peer and no blocks are found for a while.

I think we could do a variety of things to improve the current situation on master; I think that adding (say) one additional headers sync peer on some kind of timer (maybe every 5 or 10 minutes) could make sense. I think that choosing a random peer among the set of peers announcing a block is probably better peer selection than choosing a random peer (or random outbound peer) on a timer, just because if a peer sends an INV there's more reason to believe that they are responsive and going to be helpful in getting us the chain, but probably some combination of both would be even better.

However, the complexity I ran into when thinking about other strategies for initial sync has to do with the eviction logic. Right now, I think it's mostly good that we evict our (single) initial headers-sync peer if we can't get a chain tip that is recent within 20 minutes. However, triggering that logic on all our peers at the same time seems over the top to me, because there are edge-case scenarios (such as: no blocks have been found on the network for a day, or the honest chain is some kind of billion-block timewarp chain that takes more than 20 minutes to download) where I think such logic could be badly behaved for the network, because we could end up with no peers or we could fall out of consensus.

I think what I'm proposing in this patch is a narrow change that exactly addresses the bandwidth problem, and maximizes the chance we find a good peer quickly, without making our behavior in those edge-case scenarios any worse. Nevertheless, a bigger overhaul of this logic that carefully considers these things could certainly be an improvement and make this whole thing easier to think about.

@sdaftuar can you move this comment into the top of the PR; I think this context helps quite a bit (and was going to ask this a comment on the PR directly).

src/net_processing.cpp

dergoegge

Concept ACK

test/functional/p2p_initial_headers_sync.py

dergoegge · 2022-08-06T11:11:10Z

test/functional/p2p_initial_headers_sync.py

+
+    def run_test(self):
+        self.log.info("Adding a peer to node0")
+        peer1 = self.nodes[0].add_p2p_connection(P2PInterface())


Doesn't need to happen in this PR but in the future this test could also cover our preference for outbound peers for the initial headers sync.

test/functional/p2p_initial_headers_sync.py

LarryRuane · 2022-08-10T17:28:09Z

Related: #8306

pablomartin4btc

Code review ACK.

Tested ACK (unit & functional tests).

hernanmarino

Tested ACK

ajtowns · 2022-08-12T06:17:08Z

src/net_processing.cpp

+            // our initial peer is unresponsive (but less bandwidth than we'd
+            // use if we turned on sync with all peers).
+            CNodeState& state{*Assert(State(pfrom.GetId()))};
+            if (state.fSyncStarted || (!peer->m_inv_triggered_getheaders_before_sync && *best_block != m_last_block_inv_triggering_headers_sync)) {


If there are two blocks found at the same time/height, this could still trigger a getheaders to most peers, no? (Someone sends block A, then someone else sends block B, then someone else sends A, etc) Might it be better to set m_last_extra_headers_sync = SteadyClock::now(); and check that it's been at least a minute before adding an extra one?

That's true, but I think your suggestion is less robust overall than what I have proposed here: sending an INV to a node is free (there is no proof-of-work attached to a block hash), so an adversary could take advantage of your proposed strategy by continually connecting, sending an INV, and disconnecting, to prevent a node from trying to sync with any of its honest peers that are announcing main-chain blocks.

On the other hand, I just checked one of my long-running, well-connected nodes, and it's seen about 10 stale blocks in the last 50000. So that seems like pretty good odds that a node starting up is unlikely to run into this?

LarryRuane

I tested this patch pretty carefully, it's working as intended. It's not a huge improvement (I know it's not expected to be), I started IBD on mainnet and there definitely are fewer redundant getheaders. Here's a summary of what I observed by enabling the "p2p" debug category:

The node begins by getting headers from a single peer (that's the same as without this PR). When the first block announcement arrives (via inv), which usually comes (redundantly) from all of our peers, the node begins requesting headers from only one of those peers (instead of all peers that sent the same inv), which is the intended effect of this PR. A moment later, the headers reply from this peer triggers another getheaders request. This is also occurring with the original peer, so now there are two redundant "threads" (not OS threads, but threads in the sense that each headers reply launches another getheaders request to that peer).

This means that each header is received twice from here onward. If and when a second block announcement appears, a third getheaders "thread" begins, so now we're getting each header 3 times. But, again, this is much better than getting each header N times (if N is the number of our peers), which is what happens without this PR.

As others have said, this initial headers sync process can be further improved, but it's a tricky area so caution is advised! This is a step in the right direction and it seems safe to me.

Note that this isn't a huge problem in practice because, depending on network speed, the node can often download headers all the way out to the current tip before even one new block arrives, since the full header sync typically takes only a few minutes. And if a new block does arrive during this time, it's not a big problem to be receiving these redundant headers; each is only just over 80 bytes in serialized form.

I'm going to hold off on ack for now because there are a couple of changes I'd like to see in the test.

src/net_processing.cpp

test/functional/p2p_initial_headers_sync.py

LarryRuane · 2022-08-12T19:30:16Z

test/functional/p2p_initial_headers_sync.py

+        self.log.info("Check that exactly 1 of {peer2, peer3} received a getheaders in response")
+        count = 0
+        peer_receiving_headers = None
+        for p in [peer2, peer3]:


Suggested change

for p in [peer2, peer3]:

for p in [peer2, peer3]:

p.sync_with_ping()

Not sure if I'm missing something; I believe this sync_with_ping() is unnecessary because we use send_and_ping() in announce_random_block(), does that sound right?

You're right, ignore my suggestion.

test/functional/p2p_initial_headers_sync.py

LarryRuane · 2022-08-12T19:38:38Z

test/functional/p2p_initial_headers_sync.py

+        if peer2 == peer_receiving_headers:
+            expected_peer = peer3
+
+        expected_peer.wait_for_getheaders()


Suggested change

expected_peer.wait_for_getheaders()

expected_peer.wait_for_getheaders()

peer_receiving_headers.sync_with_ping()

with p2p_lock:

assert "getheaders" not in peer_receiving_headers.last_message

So I intentionally did not include this test (that a peer that we've started sync with doesn't receive a subsequent getheaders in response to an INV) because I felt like that was overspecifying what I think of as reasonable behavior. I think the most important property is that we add a new peer with each new block, but whether we continue trying to sync with existing peers is (in my view) up for debate.

For instance, if the test were rewritten a bit so that the peer receiving the initial getheaders in response to an INV had a big long headers chain to serve the node, then you'd expect there to be a getheaders in flight as headers sync proceeds. Granted that is not the situation in this test right now, but I feel like it's helpful for future test maintenance to not overspecify behavior if it doesn't really matter for what we're trying to achieve.

test/functional/p2p_initial_headers_sync.py

If our headers chain is behind on startup, then if a block is found we'll try to catch up from all peers announcing the block, in addition to our initial headers-sync peer. This commit changes behavior so that in this situation, we'll choose at most one peer announcing a block to additionally sync headers from.

LarryRuane · 2022-08-13T00:59:45Z

ACK f6a9166

ajtowns · 2022-08-15T02:46:06Z

ACK f6a9166

mzumsande · 2022-08-15T06:15:29Z

ACK f6a9166
I reviewed the code and tested this on testnet during a period where blocks were mined multiple times in a minute - each time that happened, one more peer was added to the headers sync.

I think that this would be good to get in, especially with #25717 doubling the amount of headers download data.
While testing #25717 on a slow connection on testnet, I would sometimes run into timeouts - with 2,300,000 headers (~180MB) being downloaded twice (two phases) from 10 outbound peers, it might be necessary to be able to download up to ~3.6GB within 20 minutes - this can be a problem on slower connections.

While I think that this PR is a good improvement of the existing logic, I would probably prefer longer-term a deterministic mechanism instead of one connected to a random process - sometimes multiple blocks are found immediately after the start of headers sync, sometimes none for > 30 minutes, and I don't really see why this randomness should be tied to the number of peers for our headers download, instead of doing something like simply adding another additional peer every 10 minutes.

dergoegge · 2022-08-15T16:26:46Z

Code review ACK f6a9166

I think I agree with Martin, that in the long-term a deterministic mechanism would be preferable but for now this is a good improvement (especially w.r.t. #25717).

achow101 · 2022-08-15T19:25:33Z

ACK f6a9166

…c when …

fanquake added the P2P label Jul 27, 2022

sdaftuar force-pushed the 2022-07-reduce-headers-sync-bandwidth branch from 0ae123f to 134d8d1 Compare July 27, 2022 15:54

sdaftuar mentioned this pull request Jul 27, 2022

p2p: Implement anti-DoS headers sync #25717

Merged

sdaftuar force-pushed the 2022-07-reduce-headers-sync-bandwidth branch from 134d8d1 to 4a33763 Compare July 27, 2022 21:23

brunoerg reviewed Jul 29, 2022

View reviewed changes

test/functional/p2p_initial_headers_sync.py Outdated Show resolved Hide resolved

mzumsande mentioned this pull request Aug 1, 2022

Duplicate getheaders requests #6755

Closed

maflcko reviewed Aug 2, 2022

View reviewed changes

sdaftuar force-pushed the 2022-07-reduce-headers-sync-bandwidth branch from 4a33763 to 17f2822 Compare August 3, 2022 13:48

dergoegge reviewed Aug 6, 2022

View reviewed changes

jonas-ott reviewed Aug 7, 2022

View reviewed changes

test/functional/p2p_initial_headers_sync.py Outdated Show resolved Hide resolved

pablomartin4btc approved these changes Aug 10, 2022

View reviewed changes

hernanmarino approved these changes Aug 10, 2022

View reviewed changes

ajtowns reviewed Aug 12, 2022

View reviewed changes

sdaftuar force-pushed the 2022-07-reduce-headers-sync-bandwidth branch from 17f2822 to 61931d7 Compare August 12, 2022 14:46

LarryRuane reviewed Aug 12, 2022

View reviewed changes

mzumsande reviewed Aug 12, 2022

View reviewed changes

test/functional/p2p_initial_headers_sync.py Outdated Show resolved Hide resolved

sdaftuar force-pushed the 2022-07-reduce-headers-sync-bandwidth branch from 61931d7 to e1a14c0 Compare August 12, 2022 21:05

Add functional test for block announcements during initial headers sync

f6a9166

sdaftuar force-pushed the 2022-07-reduce-headers-sync-bandwidth branch from e1a14c0 to f6a9166 Compare August 12, 2022 21:13

fanquake requested a review from dergoegge August 15, 2022 09:22

fanquake added this to the 24.0 milestone Aug 15, 2022

achow101 merged commit 22d96d7 into bitcoin:master Aug 15, 2022

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 16, 2022

Merge bitcoin#25720: p2p: Reduce bandwidth during initial headers syn…

ad7bc81

…c when …

fanquake mentioned this pull request Sep 1, 2022

contrib: add vasild to trusted keys #25871

Closed

bitcoin locked and limited conversation to collaborators Aug 16, 2023

	for p in [peer2, peer3]:
	for p in [peer2, peer3]:
	p.sync_with_ping()

-        expected_peer.wait_for_getheaders()
+        expected_peer.wait_for_getheaders()
+        peer_receiving_headers.sync_with_ping()
+        with p2p_lock:
+            assert "getheaders" not in peer_receiving_headers.last_message

p2p: Reduce bandwidth during initial headers sync when a block is found #25720

p2p: Reduce bandwidth during initial headers sync when a block is found #25720

Uh oh!

Conversation

sdaftuar commented Jul 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdaftuar Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dergoegge left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LarryRuane commented Aug 10, 2022

Uh oh!

pablomartin4btc left a comment

Choose a reason for hiding this comment

Uh oh!

hernanmarino left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LarryRuane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LarryRuane commented Aug 13, 2022

Uh oh!

ajtowns commented Aug 15, 2022

Uh oh!

mzumsande commented Aug 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dergoegge commented Aug 15, 2022

Uh oh!

achow101 commented Aug 15, 2022

Uh oh!

Uh oh!

sdaftuar commented Jul 27, 2022 •

edited

Loading

sdaftuar Aug 3, 2022 •

edited

Loading

mzumsande commented Aug 15, 2022 •

edited

Loading