-
Notifications
You must be signed in to change notification settings - Fork 894
Description
Some users report CPU iowait increases and increased read IO running v7.0.0-beta. I think it's probably because we're reading all the blobs every epoch 😅. We changed our blob pruning algorithm as a result of:
Graphs from an affected machine below:
CPU iowait (yellow):
Read IO spiking to 250MB/s every epoch:
Given that I think we want to continue with abolishing oldest_blob_slot
and allowing partial blob storage, I don't think we should revert to the previous algorithm. A quick fix is to change the frequency at which blob pruning runs to around once per day (--epochs-per-blob-prune 256
). This would be an easy change to include in v7.0.0.
There might be another pruning algorithm we could use that is a hybrid of the previous one and the new one. Maybe we could iterate block roots backwards from the start of the data availability period, deleting blobs until we reach a block that is already pruned? This way we would only need to read the blobs to be pruned. However it has two downsides:
- We will stop pruning if there is a gap in blob availability in the database, and this could happen on healthy nodes if we support partial blob storage. Counter: maybe switching from a node with partial archival blob storage to archived historical blobs could require a more thorough manual prune?
- We need a backwards block iterator on the freezer block roots. Currently we only have a forwards iterator. This is not insurmountable, but needs to be written carefully given our recent issues with state loads and historical iterators.