Skip to content

Conversation

furszy
Copy link
Member

@furszy furszy commented Jul 21, 2023

Even when the node believes it has IBD completed, need to avoid
requesting historical blocks from network-limited peers.
Otherwise, the limited peer will disconnect right away.

The simplest scenario could be a node that gets synced, drops
connections, and stays inactive for a while. Then, once it re-connects
(IBD stays completed), the node tries to fetch all the missing blocks
from any peer, getting disconnected by the limited ones.

Note:
Can verify the behavior by cherry-picking the test commit alone on
master. It will fail there.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 21, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK vasild, mzumsande, pinheadmz, achow101
Stale ACK andrewtoth

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

No conflicts as of last run.

@DrahtBot DrahtBot added the P2P label Jul 21, 2023
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe getting disconnected could be beneficial in this rare situation? If our tip is that much behind but we're not in IBD, we can't help them and they can't help us, so our priority should be to get the missing blocks asap, and getting disconnected by useless peers seems like it's not such a bad thing towards that goal.

What would be bad is if all our current peers were limited, and we wouldn't try to request the needed blocks from anyone, but also wouldn't try to exchange them for better peers, so that we would be stuck.

Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe getting disconnected could be beneficial in this rare situation? If our tip is that much behind but we're not in IBD, we can't help them and they can't help us, so our priority should be to get the missing blocks asap, and getting disconnected by useless peers seems like it's not such a bad thing towards that goal.

Agree but, while the outcome might be the desired one, the approach to achieve it isn't the best. It seems unwise to have the node sending a message through the wire, knowing it will fail, just to force the other peer to disconnect.
The node should be smarter than that and not rely on external factors to take decisions.

Moreover, in cases where the other peer chooses to ignore the message without disconnecting (maybe due to using different software than bitcoin-core), the node will end up waiting for a response that will never arrive, wasting time and resources, which could have been easily avoided.

And also, while the node has more connection slots available, I don't see why should disconnect a peer that is behaving properly.

Instead, I think that we should try to keep processes separated. The eviction process should be in charge of disconnecting peers that aren't/will not provide any meaningful data. And the block downloading logic should only be responsible of deciding which blocks request to each peer and which ones not.

@naumenkogs
Copy link
Member

What would be bad is if all our current peers were limited, and we wouldn't try to request the needed blocks from anyone, but also wouldn't try to exchange them for better peers, so that we would be stuck.

I agree with this part. I think the solution should be expanded to give a chance in such cases, and it should be merged within the same PR.

@luke-jr
Copy link
Member

luke-jr commented Jul 25, 2023

Agree with @mzumsande and @naumenkogs. Maybe not the same PR if it's substantially different, but the new eviction logic should be merged before this fix.

Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be bad is if all our current peers were limited, and we wouldn't try to request the needed blocks from anyone, but also wouldn't try to exchange them for better peers, so that we would be stuck.

Right now, this will hardly happen. The reason is CheckForStaleTipAndEvictPeers. The logic is as follows:

When the node doesn't update the tip for a period longer than 30 minutes (TipMayBeStale()), it triggers the extra outbound connection process. Which instructs the net layer to bypass the maximum outbound connection restriction and create another full relay outbound connection to sync up the missing blocks.

Then, considering the worst-case scenario: when the extra outbound connection is also a limited peer. In that case, 45 seconds after connecting to the extra peer (scheduler task timeout), CheckForStaleTipAndEvictPeers will call EvictExtraOutboundPeers, which will evict the peer that least recently announced a block header.
Then, in case of no chain movement; the process starts again, adding an extra outbound connection, and so on, until it finds a peer that can provide the missing blocks.

Note:
While this sooner or later corrects the stalling situation, it's not ideal to continue establishing connections to limited peers, on the extra outbound connection process, when the peer is stalled. So, to address this issue, have created #28170. Which basically disallows connections to limited peers when the node is behind enough to know that limited peers will not provide any of the required historical blocks (although they will still provide valid headers).

Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebased. Conflicts solved.

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK c1612ea

@furszy furszy force-pushed the 2023_net_limited_peers branch 2 times, most recently from 9eba981 to 534996a Compare November 2, 2023 22:33
Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per feedback, thanks for the review!

Fixed the two blocks buffer extension (now is reduction). Small diff.

@furszy furszy force-pushed the 2023_net_limited_peers branch from 534996a to adfc9c7 Compare November 11, 2023 00:33
Copy link
Member Author

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated per feedback, thanks for the review @vasild!

The node will now request blocks within the limited peers threshold when the block download window contemplates them and not only when the previous blocks arrived or are in-flight. Added test coverage exercising this behavior and verifying that the node can get synced even when the blocks arrived out of order.

@furszy furszy force-pushed the 2023_net_limited_peers branch from adfc9c7 to 930e531 Compare November 11, 2023 02:09
@DrahtBot DrahtBot requested review from andrewtoth and removed request for andrewtoth February 15, 2024 18:49
@@ -1451,6 +1451,7 @@ void PeerManagerImpl::FindNextBlocks(std::vector<const CBlockIndex*>& vBlocks, c
{
std::vector<const CBlockIndex*> vToFetch;
int nMaxHeight = std::min<int>(state->pindexBestKnownBlock->nHeight, nWindowEnd + 1);
bool is_limited_peer = IsLimitedPeer(peer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const bool is_limited_peer{IsLimitedPeer};

@@ -1475,6 +1476,11 @@ void PeerManagerImpl::FindNextBlocks(std::vector<const CBlockIndex*>& vBlocks, c
return;
}

// Don't request blocks that go further than what limited peers can provide
if (is_limited_peer && (state->pindexBestKnownBlock->nHeight - pindex->nHeight >= static_cast<int>(NODE_NETWORK_LIMITED_MIN_BLOCKS) - 2 /* two blocks buffer for possible races */)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what @naumenkogs is referring to here is that if this condition is true then pindex will hit this condition on the first pindex in the loop and every pindex up until this condition is no longer true. This can be more efficient by just bumping pindex to the first pindex that is past the network limited blocks threshold.

So, we can either change the loop from range based to regular for-loop and advance the index, or move this check out of this loop and into the loop on line 1464 and break early if the condition is met.

@DrahtBot DrahtBot requested review from andrewtoth and removed request for andrewtoth February 21, 2024 23:40
Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK c5b5843

@DrahtBot DrahtBot requested review from andrewtoth and removed request for andrewtoth February 26, 2024 23:12
Copy link
Member

@pinheadmz pinheadmz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c5b5843

Built and ran all functional and unit tests on x86 Ubuntu. Confirmed test fails on master, passes with the patch in the branch. Reviewed code and existing discussion. Regarding the decision to keep the pruning peer connected at all, I agree with the author that the affected code is only responsible for requesting available blocks from a peer and it is reasonable to take the limited flag into account. I like the clean up in the first commit as well, it's easier to read the loop with continue.

For sanity I added checks in the test to confirm blocks from the pruned node were "found" by the full node but had 'confirmations': -1, which makes sense because there is a gap between the full node's chain tip and the blocks available from the pruning peer.

Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK c5b5843d8f10d96f76ee6b95f2b1b1b4ce755f75
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmXs7UsACgkQ5+KYS2KJ
yTpnUg//b9HSUh/s1VJRWSoX23pcmFJtmeA4ndd5dGM+Q2RbyBZenI4ZwVlkunxy
l5B+U/gpD7cL/6iZsV6Eph/ouX9Om8NKQcxMn0TXzvOBYcByCZB++r5oNwfeKohu
aUJWRce9MYDj4GtdbW7p5DwcsKaX6nC8oOnxfv9ivP473F7BDjnYZpX7F0nKsrx7
b5vKMWCY7fXxmHpe0y0FnWVAJ10hvIQFD6I0EMO1mQpq14U3NUcG+Axfm6kCrJRE
EbeAKtxz/tITNNiN5AZg2NdgnmOR1MaOatkOuCxWgMer+N2EOWZgeW+/42Iy0Isz
hOnv9BQi4MwewY/I5ruM7Yx2oghUEpIkIhBX8/JPeA9iD2pnVIcXi6Xkaawd31C1
Xl7OMpX1mOVtPjNPvmYxpbFjze6BhGpaa3h7UJFSBzqhFm0gpe70f+hza4HbXUQj
MihsnKASJPmCDLFY/kGddad/E+yJ5QNxSHhLqEugkWPGPnIa4CHyd91uU5qQwwOv
H1sN9hAlNYguDZs79jPY32yIXgpufQyUWQA1+emQpDQun68jHjTxTdgUFEbvlIDU
Nl+fxsY9Fqj4D3HpdnnwIUcX1W68itMhPPBQUDu3R1abdUHMc/ntE4gwr6BP3LcL
AFv+d4aflQrEG9RntgmkyUSnYw6VIj9INlQMS6ql4r70Kh/6Bhk=
=C8tL
-----END PGP SIGNATURE-----

pinheadmz's public key is on keybase

@DrahtBot DrahtBot requested review from andrewtoth and removed request for andrewtoth March 9, 2024 23:23
@achow101
Copy link
Member

ACK c5b5843

@DrahtBot DrahtBot requested review from andrewtoth and removed request for andrewtoth March 11, 2024 12:09
@achow101 achow101 merged commit 4a90374 into bitcoin:master Mar 11, 2024
@furszy furszy deleted the 2023_net_limited_peers branch March 11, 2024 12:18
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Jul 30, 2025
27f260a net: remove now unused global 'g_initial_block_download_completed' (furszy)
aff7d92 test: add coverage for peerman adaptive connections service flags (furszy)
6ed5360 net: peer manager, dynamically adjust desirable services flag (furszy)
9f36e59 net: move state dependent peer services flags (furszy)
f9ac96b net: decouple state independent service flags from desirable ones (furszy)
97df4e3 net: store best block tip time inside PeerManager (furszy)

Pull request description:

  Derived from bitcoin#28120 discussion.

  By relocating the peer desirable services flags into the peer manager, we
  allow the connections acceptance process to handle post-IBD potential
  stalling scenarios.

  The peer manager will be able to dynamically adjust the services flags
  based on the node's proximity to the tip (back and forth). Allowing the node
  to recover from the following post-IBD scenario:
  Suppose the node has successfully synced the chain, but later experienced
  dropped connections and remained inactive for a duration longer than the limited
  peers threshold (the timeframe within which limited peers can provide blocks). In
  such cases, upon reconnecting to the network, the node might only establish
  connections with limited peers, filling up all available outbound slots. Resulting
  in an inability to synchronize the chain (because limited peers will not provide
  blocks older than the `NODE_NETWORK_LIMITED_MIN_BLOCKS` threshold).

ACKs for top commit:
  achow101:
    ACK 27f260a
  vasild:
    ACK 27f260a
  naumenkogs:
    ACK 27f260a
  mzumsande:
    Light Code Review ACK 27f260a
  andrewtoth:
    ACK 27f260a

Tree-SHA512: 07befb9bcd0b60a4e7c45e4429c02e7b6c66244f0910f4b2ad97c9b98258b6f46c914660a717b5ed4ef4814d0dbfae6e18e6559fe9bec7d0fbc2034109200953
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Jul 31, 2025
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Jul 31, 2025
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Jul 31, 2025
This adds the refactoring changes from commit 7312772
that were missing from the original backport. The refactoring makes the code more
readable by using early continues instead of nested if-else statements.
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Aug 8, 2025
27f260a net: remove now unused global 'g_initial_block_download_completed' (furszy)
aff7d92 test: add coverage for peerman adaptive connections service flags (furszy)
6ed5360 net: peer manager, dynamically adjust desirable services flag (furszy)
9f36e59 net: move state dependent peer services flags (furszy)
f9ac96b net: decouple state independent service flags from desirable ones (furszy)
97df4e3 net: store best block tip time inside PeerManager (furszy)

Pull request description:

  Derived from bitcoin#28120 discussion.

  By relocating the peer desirable services flags into the peer manager, we
  allow the connections acceptance process to handle post-IBD potential
  stalling scenarios.

  The peer manager will be able to dynamically adjust the services flags
  based on the node's proximity to the tip (back and forth). Allowing the node
  to recover from the following post-IBD scenario:
  Suppose the node has successfully synced the chain, but later experienced
  dropped connections and remained inactive for a duration longer than the limited
  peers threshold (the timeframe within which limited peers can provide blocks). In
  such cases, upon reconnecting to the network, the node might only establish
  connections with limited peers, filling up all available outbound slots. Resulting
  in an inability to synchronize the chain (because limited peers will not provide
  blocks older than the `NODE_NETWORK_LIMITED_MIN_BLOCKS` threshold).

ACKs for top commit:
  achow101:
    ACK 27f260a
  vasild:
    ACK 27f260a
  naumenkogs:
    ACK 27f260a
  mzumsande:
    Light Code Review ACK 27f260a
  andrewtoth:
    ACK 27f260a

Tree-SHA512: 07befb9bcd0b60a4e7c45e4429c02e7b6c66244f0910f4b2ad97c9b98258b6f46c914660a717b5ed4ef4814d0dbfae6e18e6559fe9bec7d0fbc2034109200953
DashCoreAutoGuix pushed a commit to DashCoreAutoGuix/dash that referenced this pull request Aug 9, 2025
…peers threshold

c5b5843 test: avoid requesting blocks beyond limited peer threshold (furszy)
2f6a055 p2p: sync from limited peer, only request blocks below threshold (furszy)
7312772 refactor: Make FindNextBlocks friendlier (furszy)

Pull request description:

  Even when the node believes it has IBD completed, need to avoid
  requesting historical blocks from network-limited peers.
  Otherwise, the limited peer will disconnect right away.

  The simplest scenario could be a node that gets synced, drops
  connections, and stays inactive for a while. Then, once it re-connects
  (IBD stays completed), the node tries to fetch all the missing blocks
  from any peer, getting disconnected by the limited ones.

  Note:
  Can verify the behavior by cherry-picking the test commit alone on
  master. It will fail there.

ACKs for top commit:
  achow101:
    ACK c5b5843
  vasild:
    ACK c5b5843
  mzumsande:
    Code Review ACK c5b5843
  pinheadmz:
    ACK c5b5843

Tree-SHA512: 9e550698bc6e63cc587b2b988a87d0ab555a8fa188c91c3f33287f8201d77c28b373331845356ad86f17bb21c15950b6466bc1cafd0ce8139d70364cb71c2ad2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants