Skip to content

Conversation

instagibbs
Copy link
Member

@instagibbs instagibbs commented May 11, 2023

This is an attempt at mitigating #25258 , which is a revival of #10984, which is a revival of #9447.

This PR attempts to mitigate a single case, where high bandwidth peers can bail us out of a flakey
peer not completing blocks for us. We allow up to 2 additional getblocktxns requests per unique block.
This would hopefully allow the chance for an honest high bandwidth peer to hand us the transactions
even if the first in flight peer stalls out.

In contrast to previous effort:

  1. it will not help if subsequent peers send block headers only, so only high-bandwidth peers this time. See: https://github.com/bitcoin/bitcoin/pull/10984/files#diff-6875de769e90cec84d2e8a9c1b962cdbcda44d870d42e4215827e599e11e90e3R1411
  2. MAX_GETBLOCKTXN_TXN_AFTER_FIRST_IN_FLIGHT is removed, in favor of aiding recovery during turbulent mempools
  3. We require one of the 3 block fetching slots to be an outbound peer. This can be the original offering peer, or subsequent compact blocks given by high bandwidth peers.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 11, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK sdaftuar, mzumsande
Concept ACK jamesob, theuni
Stale ACK dergoegge, ajtowns

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #27596 (assumeutxo (2) by jamesob)
  • #27052 (test: rpc: add last block announcement time to getpeerinfo result by LarryRuane)
  • #26969 (net, refactor: net_processing, add ProcessCompactBlockTxns by brunoerg)
  • #26812 (test: add end-to-end tests for CConnman and PeerManager by vasild)
  • #26621 (refactor: Continue moving application data from CNode to Peer by dergoegge)
  • #24008 (assumeutxo: net_processing changes by jamesob)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@instagibbs instagibbs force-pushed the 2023-05-parallel-block-downloads branch from 1071c38 to 6736955 Compare May 11, 2023 14:10
@glozow glozow added the P2P label May 11, 2023
@glozow glozow requested a review from sdaftuar May 11, 2023 14:23
jamesob added a commit to chaincodelabs/bmon that referenced this pull request May 11, 2023
Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems okay at first glance. It would be nice to have a "blockrequest.cpp" module that collects all this logic together. Perhaps we want to keep this change simple so it's a smaller backport, and refactor later, though.

/** Greg: Setting to 3 to match number of high bandwidth peers... ~ensures one is outbound */
static const unsigned int MAX_CMPCTBLOCKS_INFLIGHT_PER_BLOCK = 3;
/** Maximum number of missing transactions for a non-first getblocktxn request */
static const unsigned int MAX_GETBLOCKTXN_TXN_AFTER_FIRST_IN_FLIGHT = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of the last 612 compact blocks my node's reconstructed that required requesting transactions, 277 needed 10 or fewer (45%), 441 needed 50 or fewer (72%), 488 needed 100 or fewer (80%), 548 needed 500 or fewer (90%). I'd probably prefer 50 or 100 than 10, I guess?

(In simpler times, 1086/1162 (93%) need 10 or fewer, 1140/1162 (98%) needed 50 or fewer, 1144/1162 (98.5%) needed 100 or fewer. Again, I'm not including blocks that requested 0 txs in those denominators)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we just removed the limit, since it's not much of a protection, especially if the missing 10 txns are *nscriptions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the limit's fine; when you start up with an empty mempool, you'll be requesting most of the transactions in the block because your mempool's empty, and there's no need to do that three times in parallel. Could perhaps have the limit work the other way: if you've already got N txs and need K txs, and 0 < K <= N/4 + 10; then if the block just has ten 100kvB txs, you won't request them multiple times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, it doesn't seem worthwhile to have the limit:

  • the good case (small K) should be far more common than the bad one (large K)
  • the limit makes relay of blocks with large K less reliable w.r.t to stalling (reduces to what we have now)
  • the limit adds code complexity and a maintenance burden. 10 is not a good value according to your data and if Allow 2 simultaneous (compact-)block downloads #10984 had been merged then we would need to re-evaluate now. Who's to say this won't change again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the issue is during warmup, that's a pretty small window, and typically won't actually do parallel fetching, 3 seems rare even when warmed up with limit of 10. I'll gather some more stats on that front, seeing how many parallel blocks are initialized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "warmup" should really only be an issue after we've seen new blocks from 3+ peers since we don't persist hb peers across restarts and we have to wait for MaybeSetPeerAsAnnouncingHeaderAndIDs to be called on multiple unique peers. I now suspect the warmup period handles itself? My logs show 4+ blocks per node with only one partial block.

@instagibbs
Copy link
Member Author

instagibbs commented May 12, 2023

I'll clean up history later when we're closer to ACKs

@Sjors
Copy link
Member

Sjors commented May 12, 2023

In contrast to previous effort, it will not help if subsequent peers send block headers only

Can you elaborate on this? Are you saying that falling back to full block download is pointless?

@instagibbs
Copy link
Member Author

@Sjors Basically previous efforts would allow the peer to send a header, not a compact block, and we would fire off an additional getdata for the compact block. See this block of code: https://github.com/bitcoin/bitcoin/pull/10984/files#diff-6875de769e90cec84d2e8a9c1b962cdbcda44d870d42e4215827e599e11e90e3R1411

IMO it's unlikely to make an impact, as high-bandwidth peers are already demonstrably fast, and allows us to control better who takes up the additional slots.

@Sjors
Copy link
Member

Sjors commented May 12, 2023

So that earlier attempt would solicit a compact block from a (BIP 152) low-bandwidth mode peer, in response to it announcing a header. That's similar to what we already do in HeadersDirectFetchBlocks, but allowing more than 1 in transit?

Whereas in this PR we focus on the high-bandwidth mode peers, which are allowed to send us a cmpctblock message straight away (rather than waiting for us to ask). Right now we can only have one cmpctblock -> getblocktxn -> blocktxn "session" in progress. This PR makes that 3.

This approach seems fine, but the other attempts were never merged, which is why I am confused by the "it will not help" phrase. Are you saying these earlier attempts failed in some experiment, are they useless in general or is this new approach just better (and less complex?)?

@instagibbs
Copy link
Member Author

This PR makes that 3.

Makes it 3 parallel block downloads, first can be of any kind, the 2nd and 3rd must be compact block announcements from high badwidth peers.

I am confused by the "it will not help" phrase

This PR does not take advantage of low-bandwidth compact block peers message patterns, in other words.

@instagibbs
Copy link
Member Author

instagibbs commented May 15, 2023

I was getting some sybil-stalls above the missing txn <=10 limit, so for now uncapped it.

I also added in logic to ensure that at least one of the 3 slots is taken by an outbound peer. This should be more robust against sybils that are able to take up all inbound hb slots. e.g.,:

  1. uses fresh connection to announce, then stall
  2. uses both inbound hb peers to take up 2nd and 3rd slots, but never gives txns
  3. fresh peer times out, gets disconnected
  4. repeat 1-3

Added more extensive tests.

@instagibbs instagibbs force-pushed the 2023-05-parallel-block-downloads branch from bbe260f to a9f9145 Compare May 15, 2023 15:18
Sjors pushed a commit to Sjors/bitcoin that referenced this pull request May 27, 2023
Sjors pushed a commit to Sjors/bitcoin that referenced this pull request May 27, 2023
Github-Pull: bitcoin#27626
Rebased-From: a905954
Sjors pushed a commit to Sjors/bitcoin that referenced this pull request May 27, 2023
Sjors pushed a commit to Sjors/bitcoin that referenced this pull request May 27, 2023
…ight

This is a change in behavior so that if for some reason we request a block from a peer, we don't allow an unsolicited CMPCT_BLOCK announcement for that same block to cause a request for a full block from the uninvited peer (as some type of request is already outstanding from the original peer)

Github-Pull: bitcoin#27626
Rebased-From: 13f9b20
Sjors pushed a commit to Sjors/bitcoin that referenced this pull request May 27, 2023
A single outbound slot is required, so if the first two slots
are taken by inbound in-flights, the node will reject additional
unless they are coming from outbound.

This means in the case where a fast sybil peer is attempting to
stall out a node, a single high bandwidth outbound peer can
mitigate the attack.

Github-Pull: bitcoin#27626
Rebased-From: 03423f8
Sjors pushed a commit to Sjors/bitcoin that referenced this pull request May 27, 2023
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 2, 2023
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 2, 2023
Github-Pull: bitcoin#27626
Rebased-From: a905954
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 2, 2023
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 2, 2023
…ight

This is a change in behavior so that if for some reason we request a block from a peer, we don't allow an unsolicited CMPCT_BLOCK announcement for that same block to cause a request for a full block from the uninvited peer (as some type of request is already outstanding from the original peer)

Github-Pull: bitcoin#27626
Rebased-From: 13f9b20
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 2, 2023
A single outbound slot is required, so if the first two slots
are taken by inbound in-flights, the node will reject additional
unless they are coming from outbound.

This means in the case where a fast sybil peer is attempting to
stall out a node, a single high bandwidth outbound peer can
mitigate the attack.

Github-Pull: bitcoin#27626
Rebased-From: 03423f8
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 2, 2023
@pinheadmz
Copy link
Member

@instagibbs does this close #21803 ?

fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 16, 2023
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 16, 2023
Github-Pull: bitcoin#27626
Rebased-From: a905954
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 16, 2023
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 16, 2023
…ight

This is a change in behavior so that if for some reason we request a block from a peer, we don't allow an unsolicited CMPCT_BLOCK announcement for that same block to cause a request for a full block from the uninvited peer (as some type of request is already outstanding from the original peer)

Github-Pull: bitcoin#27626
Rebased-From: 13f9b20
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 16, 2023
A single outbound slot is required, so if the first two slots
are taken by inbound in-flights, the node will reject additional
unless they are coming from outbound.

This means in the case where a fast sybil peer is attempting to
stall out a node, a single high bandwidth outbound peer can
mitigate the attack.

Github-Pull: bitcoin#27626
Rebased-From: 03423f8
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Jun 16, 2023
fanquake added a commit that referenced this pull request Jul 4, 2023
b8ad322 Unconditionally return when compact block status == READ_STATUS_FAILED (Greg Sanders)
cdd3de0 Add tests for parallel compact block downloads (Greg Sanders)
e66a5cb Support up to 3 parallel compact block txn fetchings (Greg Sanders)
d1a93f5 Only request full blocks from the peer we thought had the block in-flight (Greg Sanders)
38e3af9 Convert mapBlocksInFlight to a multimap (Greg Sanders)
a45159b Remove nBlocksInFlight (Greg Sanders)
722361e alias BlockDownloadMap for mapBlocksInFlight (Greg Sanders)

Pull request description:

  Backports:
  * #27626
  * #27743

ACKs for top commit:
  instagibbs:
    utACK b8ad322
  ajtowns:
    ACK b8ad322 ; confirmed patches are clean cherry-picks from master, and already tested patches prior to 25.0 release

Tree-SHA512: 438901496a5ed927662e62f936e3d1e7ffb727cb235869854983e8e29a68e144eb3bff307d9fc3ae785fb276b67a216b1cce397689252ca49c5d761efc1380ac
@bitcoin bitcoin locked and limited conversation to collaborators Jun 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel compact block download