Skip to content

High read IO due to blob pruning #7100

@michaelsproul

Description

@michaelsproul

Some users report CPU iowait increases and increased read IO running v7.0.0-beta. I think it's probably because we're reading all the blobs every epoch 😅. We changed our blob pruning algorithm as a result of:

Graphs from an affected machine below:

CPU iowait (yellow):

CPU iowait

Read IO spiking to 250MB/s every epoch:

Increased read IO

Given that I think we want to continue with abolishing oldest_blob_slot and allowing partial blob storage, I don't think we should revert to the previous algorithm. A quick fix is to change the frequency at which blob pruning runs to around once per day (--epochs-per-blob-prune 256). This would be an easy change to include in v7.0.0.

There might be another pruning algorithm we could use that is a hybrid of the previous one and the new one. Maybe we could iterate block roots backwards from the start of the data availability period, deleting blobs until we reach a block that is already pruned? This way we would only need to read the blobs to be pruned. However it has two downsides:

  • We will stop pruning if there is a gap in blob availability in the database, and this could happen on healthy nodes if we support partial blob storage. Counter: maybe switching from a node with partial archival blob storage to archived historical blobs could require a more thorough manual prune?
  • We need a backwards block iterator on the freezer block roots. Currently we only have a forwards iterator. This is not insurmountable, but needs to be written carefully given our recent issues with state loads and historical iterators.

Metadata

Metadata

Assignees

No one assigned

    Labels

    databaseoptimizationSomething to make Lighthouse run more efficiently.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions