Skip to content

Improve range sync with PeerDAS #6258

@dapplion

Description

@dapplion

Roadmap

  1. Downscore peers for invalid data Fix PeerDAS sync scoring #7352
  • Track who send what in batches. We need change PeerId for PeerGroup which allows to track who send each column
  • Make beacon processor errors more descriptive such that sync can know which column caused the RpcBlock to be invalid
  1. Downscore peers for custody failures
  • Send requests to peers that are part of the SyncingChain peerset, instead of the global pool. This will cause SyncingChains to error frequently with NoPeer errors
    • Modify syncing batches to allow them to stay in the AwaitingDownload state when they have no peers
    • Remove good_peers_on_sampling_subnets
    • Extras:
      • Consider adding a fallback mechanism where we fetch from the global peer set, but only for those that are synced up to the requested batch. And in that case don't penalize custody failures
      • Assume all finalized peers are in the same chain and improve SyncingChain grouping
      • Implement StatusV2
  • Change by_range sync download algorithm to fetch blocks first, then columns. Use the blocks are source of truth to match against columns and penalize custody failures.
    • V1: Assume block peer is honest, and believe the blocks are canonical
    • V2: Send blocks to the processor to verify correct proposer index and proposer signature (significantly increases the cost of an attack). Note: this will require way more complex logic, similar to lookup sync. Note^2: batch syncing will no longer download and process in parallel. Because we need to processing is sequential and we need to process blocks before completing a batch download.
  1. Request invidual columns when a peer fails to serve the columns
  1. Reconstruct if we can't download all columns that we need but we have >= 50%
  2. Improve peer selection logic: what peer to select next for column requests? i.e. if a peer has a custody failure, should we never request from it, prevent requests to it for some time?

Extras

  1. Refactor verify_kzg_for_rpc_blocks outside of the da_checker, it does not use the cache
  2. Change RpcBlock to holding a Vec of DataColumnSidecars so we don't need a spec reference

Why requesting individual requests is useful

Currently range sync and backfill sync fetch blocks and blobs from the network with this sequence:

  • Out of the pool of peers in the chain (peers that agree on some fork-choice state) select ONE peer
  • Immediately issue block_by_range and blobs_by_range to the SAME ONE peer
  • If any of those requests error, fail BOTH requests and retry with another peer

This strategy is not optimal but good enough for now. However with PeerDAS, the worst-case number of requests per batch increases from 2 (blocks + blobs) to 2 + DATA_COLUMN_SIDECAR_SUBNET_COUNT / CUSTODY_REQUIREMENT = 2+32 (if not connected to any larger node.

If we extend the current paradigm, a single failure on a columns_by_range request will trigger a retry of all 34 requests. Not optimal 😅

A solution is to make the "components_by_range" request able to retry each individual request. This is what block lookup requests do, where each component (block, blobs, custody) has its own state and retry count.

Metadata

Metadata

Assignees

Labels

dasData Availability Samplinghardeningmajor-taskA significant amount of work or conceptual task.syncingv8.0.0Q4 2025 Fusaka Mainnet Release

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions