-
Notifications
You must be signed in to change notification settings - Fork 898
Closed
Labels
optimizationSomething to make Lighthouse run more efficiently.Something to make Lighthouse run more efficiently.v7.0.0New release c. Q1 2025New release c. Q1 2025v7.0.0-beta.cleanClean release post Holesky rescueClean release post Holesky rescue
Description
Short term plan:
- Move banned block checks higher in block verification to prevent repeat state lookups (before every instance of
load_parent
inblock_verification.rs
) - Encourage use of
--state-cache-size 4
to avoid bad state cache pruning logic that is keeping 128x 180MB epoch boundary states around (~24GB of states). - (DONE) Remove block root lookups from status processing. We are getting killed looking up old states to compute the block root. We need a more aggressive version of this PR: Optimise status processing #5481.
Point (1) is intended to fix an OOM that happens to nodes that are in sync and forced to process junk.
Point (2) fixes OOMs during head sync due to lots of epoch boundary states being retain.
To investigate later:
- Why are epoch boundary state diffs so large (180MB+), given that we should be basing them off each other while syncing sequential blocks? Answer:
balances
andinactivity_scores
. - Is an earlier invalid block check sufficient to prevent OOM while synced? Are there are other states or valid side chains which are forcing us to load states and use too much memory?
- Why is sync sending us so many copies of the invalid block? Is there parallelism that is causing the OOM near the head?
Future plans (long-term fixes):
- Implement the PromiseCache concept used for attestation committees for beacon states. This is quite subtle to get right, a version was previously attempted but abandoned (Unify and lower state caches #5313). Tracking issue: Improve & unify parallel de-duplication caches #5112
- Implement size-based pruning for the state cache. This is possible with my WIP changes from: State cache memory size WIP #6532. However, that code is quite immature and the pruning itself is expensive (1.5s-4s or more), so we cannot ship this quickly. There is also some subtlety around deciding which states to prune based on size (we could use a similar heuristic to the existing
cull
method on the 20% largest states). - Re-think pruning logic in
cull
so that it doesn't hang on to so many useless epoch boundary states.
Metadata
Metadata
Assignees
Labels
optimizationSomething to make Lighthouse run more efficiently.Something to make Lighthouse run more efficiently.v7.0.0New release c. Q1 2025New release c. Q1 2025v7.0.0-beta.cleanClean release post Holesky rescueClean release post Holesky rescue