Skip to content

Improve block processing performance during re-org #2805

@michaelsproul

Description

@michaelsproul

Description

Consider the following re-org that frustrates Lighthouse's attempts to process blocks quickly:

Let n be a slot on an epoch boundary (n % 32 == 0).

  1. Immediately prior to slot n the preemptive state advance occurs as normal
  2. The block from slot n arrives super late (12s+), consuming the advanced state
  3. The block from slot n + 1 arrives on time, but builds upon the parent at slot n -1. It's going to be super slow to process because its parent state is missing from the cache, meaning:
    a) We need to load the full state for slot n - 1 from disk (a few hundred ms)
    b) We need to transition that state through an epoch boundary (200ms)
    c) We need to store the state for slot n on disk. It is different from the slot n slot with block n applied, and presently we store every epoch boundary state

Example

Here's an instance of this behaviour that I observed at slot n=2485472 on mainnet, resulting in block processing taking 2.5s instead of the usual 80ms (median) or 456ms (99th percentile) (metrics from sigp/lighthouse-metrics#31).

Nov 11 16:55:03.815 WARN Beacon chain re-org                     reorg_distance: 1, new_slot: 2485473, new_head: 0xb17f…a572, new_head_parent: 0x0f98…9b22, previous_slot: 2485472, previous_head: 0x1c4d…c94b, service: beacon
Nov 11 16:55:03.818 DEBG Delayed head block                      set_as_head_delay: Some(222.219889ms), imported_delay: Some(2.545278411s), observed_delay: Some(2.051036927s), block_delay: 4.818535227s, slot: 2485473, proposer_index: 52065, block_root: 0xb17fe52ce55315713a9e3eb28858a1a53039daf9e1f6406aa2c8d0d8ae11a572, service: beacon

Even though the block arrived on time, taking 2.5s to process it meant that any attestations at this slot would have missed (if running on this node).

Additional Info

It should be noted that this behaviour should be quite rare, due to the infrequency of re-orgs and late blocks on mainnet (at worst ~4% of blocks are late, with very few being 12s+ late). However if proposer boosting is adopted we may see more re-orgs of this type, where a proposer intentionally orphans the previous block despite it having been published.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A1major-taskA significant amount of work or conceptual task.optimizationSomething to make Lighthouse run more efficiently.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions