Skip to content

Supernode unable to progress range sync due to unknown RPC error #6989

@jimmygchen

Description

@jimmygchen

I've been trying to sync a fresh supernode to peerdas-devnet-4, and finalized sync seems to frequently get stuck.

I've observed RPC requests sent and completed but eventually the batch failed with RPC Error - but i don't see any RPC error for this batch during this period.

See some partial logs below

  • 15 requests were sent for this batch epoch = 41, range request id = 35
  • 15 requests were completed
  • and a Batch failed. RPC Error followed immediately, with no RPC error logs found in this period.

I'm suspecting we have an RPC error log suspressed somewhere, it's not obvious from the log what the rpc error is, and the node just keep retrying to sync the same batch and unable to make progress.

Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 36/35/RangeSync/41/1, peer: 16Uiu2HAmDdyTRG4dESR8v9nnAbdwuqWXHDQ8hLwhgmSDV1mooiw9, epoch: 41, slots: 32, method: BlocksByRange, service: sync, module: network::sync::network_context:788
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 37/35/RangeSync/41/1, peer: 16Uiu2HAmH5PwwsfXLCGqSi2xFgPu7Zy9xrzXKTqc5ZwxkkdhYbVy, columns: [105, 37, 35, 36, 76, 54, 96, 26], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 38/35/RangeSync/41/1, peer: 16Uiu2HAm9PijSZpm5QUphXRoBtkhUZPkGJ4Rgxk4Bny91oZPYZLG, columns: [100, 3, 13, 126, 63, 7, 103, 85, 41], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 39/35/RangeSync/41/1, peer: 16Uiu2HAm35d5GWdmRH2XtjM4PMi6MXyCSwXgZi5KwkGNijbSmARB, columns: [89, 8, 79, 110, 75, 15, 109, 58, 1], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 40/35/RangeSync/41/1, peer: 16Uiu2HAmDdyTRG4dESR8v9nnAbdwuqWXHDQ8hLwhgmSDV1mooiw9, columns: [93, 45, 46, 60, 47, 115], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 41/35/RangeSync/41/1, peer: 16Uiu2HAkzUzciBZ9mW18RAwhGowdz6SRbt36gS1NZupVXUkGENW7, columns: [18, 43, 57, 30, 120, 39, 95, 2, 56, 98, 127], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 42/35/RangeSync/41/1, peer: 16Uiu2HAm4SC4chiQADwP7M8xu2nigb7GsabhKfULHZ2gMAcw6fmD, columns: [24, 4, 71, 102, 44, 21, 118, 87, 11, 53], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 43/35/RangeSync/41/1, peer: 16Uiu2HAmNSdQ4yXgyBUpQ5yGRE32L4t4kkQahfaHCxcmZHbtT7md, columns: [99, 49, 6, 80, 40, 114, 61], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 44/35/RangeSync/41/1, peer: 16Uiu2HAmAAZ5wP6fvpe1b9tWNmgA2Wn8MsrNYVhkw5WohcAoaHKR, columns: [25, 121, 104, 108, 111, 48, 92, 14], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 45/35/RangeSync/41/1, peer: 16Uiu2HAmJyaVGkRGR9ACqompkoge8T2x4KFH4KbzDkh7zz6uN2JX, columns: [125, 77, 17, 74, 23, 123, 72, 84, 59, 124, 107], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 46/35/RangeSync/41/1, peer: 16Uiu2HAmChnSSEeAB9w5Gw2zqF23ufsrPAUqZWWo3mUgJhWJWRP4, columns: [112, 62, 91, 32, 42], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 47/35/RangeSync/41/1, peer: 16Uiu2HAmR4AjDzeAMa8ud5E4L3tu92D3GfBJ3z92MbqUEwHiFgnN, columns: [69, 70, 116, 83, 119, 94, 31, 0, 9, 68, 64, 65, 90], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 48/35/RangeSync/41/1, peer: 16Uiu2HAkwLdYWXA39aykj6uLy6knHo7V6CaMVU7Xq2smc6Ysw5eh, columns: [10, 16, 19, 27, 117, 113, 12, 50, 97, 29], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 49/35/RangeSync/41/1, peer: 16Uiu2HAmNBbHKNxDDgZ1wjRRaodUavr9oiQ7awRcmT2m42VATZpn, columns: [88, 67, 101, 55, 122, 82, 5, 81, 20, 51], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:17.144 DEBG Sync RPC request sent, id: 50/35/RangeSync/41/1, peer: 16Uiu2HAmLCfZdW9qfBvmvVVCKSPDKNeyAZTgx3eZSxF11Ao85JCb, columns: [33, 38, 86, 66, 73, 52, 22, 78, 106, 34, 28], epoch: 41, slots: 32, method: DataColumnsByRange, service: sync, module: network::sync::network_context:870
Feb 12 04:38:19.875 DEBG Sync RPC request completed, count: 209, method: DataColumnsByRange, id: 45/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:23.952 DEBG Sync RPC request completed, count: 190, method: DataColumnsByRange, id: 48/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:24.485 DEBG Sync RPC request completed, count: 209, method: DataColumnsByRange, id: 41/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:25.326 DEBG Sync RPC request completed, count: 190, method: DataColumnsByRange, id: 42/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:25.433 DEBG Sync RPC request completed, count: 143, method: DataColumnsByRange, id: 39/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:26.803 DEBG Sync RPC request completed, count: 95, method: DataColumnsByRange, id: 46/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:27.350 DEBG Sync RPC request completed, count: 209, method: DataColumnsByRange, id: 50/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:28.968 DEBG Sync RPC request completed, count: 133, method: DataColumnsByRange, id: 43/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:29.719 DEBG Sync RPC request completed, count: 152, method: DataColumnsByRange, id: 44/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:31.173 DEBG Sync RPC request completed, count: 152, method: DataColumnsByRange, id: 37/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:31.272 DEBG Sync RPC request completed, count: 171, method: DataColumnsByRange, id: 38/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:32.228 DEBG Sync RPC request completed, count: 190, method: DataColumnsByRange, id: 49/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:35.236 DEBG Sync RPC request completed, count: 247, method: DataColumnsByRange, id: 47/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:38.363 DEBG Sync RPC request completed, count: 27, method: BlocksByRange, id: 36/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:38.363 DEBG Sync RPC request completed, count: 114, method: DataColumnsByRange, id: 40/35/RangeSync/41/1, service: sync, module: network::sync::network_context:1128
Feb 12 04:38:38.396 DEBG Batch failed. RPC Error, batch_state: [D,D,p,p,p,], request_id: 35, peer_id: 16Uiu2HAmDdyTRG4dESR8v9nnAbdwuqWXHDQ8hLwhgmSDV1mooiw9, batch_state: Downloading(16Uiu2HAmDdyTRG4dESR8v9nnAbdwuqWXHDQ8hLwhgmSDV1mooiw9, 35), batch_epoch: 41, chain: 1, s
ervice: range_sync, service: sync, module: network::sync::range_sync::chain:867

Full logs attached, I restarted the node a few times, so you may want to search for the timestamps from the partial logs above:
beacon.log.zip

The kurtosis config to sync peerdas-devnet-4:

participants:
  - cl_type: lighthouse
    cl_image: sigp/lighthouse:latest-unstable
    cl_extra_params:
      - --subscribe-all-data-column-subnets
  - cl_type: lighthouse
    cl_image: sigp/lighthouse:latest-unstable
network_params:
  network: peerdas-devnet-4
snooper_enabled: false
additional_services:
  - prometheus_grafana

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions