Skip to content

Light client optimistic updates are requeued too eagerly #6994

@michaelsproul

Description

@michaelsproul

When processing light client optimistic updates we are returning a UnknownBlockParentRoot error for cases where we do know the parent block:

if canonical_root != head_block.message().parent_root() {
return Err(Error::UnknownBlockParentRoot(canonical_root));
}

The effect of this is that we will requeue messages unnecessarily (instead of ignoring them immediately):

LightClientOptimisticUpdateError::UnknownBlockParentRoot(parent_root) => {
metrics::inc_counter(
&metrics::BEACON_PROCESSOR_REPROCESSING_QUEUE_SENT_OPTIMISTIC_UPDATES,
);
debug!(
self.log,
"Optimistic update for unknown block";
"peer_id" => %peer_id,
"parent_root" => ?parent_root
);
if let Some(sender) = reprocess_tx {
let processor = self.clone();
let msg = ReprocessQueueMessage::UnknownLightClientOptimisticUpdate(
QueuedLightClientUpdate {
parent_root,
process_fn: Box::new(move || {
processor.process_gossip_optimistic_update(
message_id,
peer_id,
light_client_optimistic_update,
None, // Do not reprocess this message again.
seen_timestamp,
)
}),
},
);

The reason the check is wrong is that it is only checking the update's canonical_root against the head's parent root. This is correct in the case where e.g. our head is 1 slot behind the update, so we have update at slot 31, head at slot 31, and slot_31_root != slot_30_root. That's fine. The cases that are dodgy are if we have old light client updates, like:

  • update slot is 31, head slot is 33. In this case the parent of the head is the slot 32 block, which is not the slot 31 block, so we return UnknownBlockParentRoot and requeue the message. Requeueing does not help, because when we re-process this message a slot later, the head will either be the same or even newer, and the check will still fail.

The reprocess queue requeues messages a maximum of once, so the impact of this issue is quite minimal: we will requeue some messages unnecessarily and keep them around for up to 12s. This is also only likely to occur if we are lagging on processing updates a little.

We can detect the frequency of this occuring by looking for the log:

Not sending light client update because it had been reprocessed

There were only 33 instances of this per node prior to the fix made here:

But we should continue to monitor, and eventually implement a fix. Perhaps checking the slot of the update to work out whether it is too old and can just be ignored.

Metadata

Metadata

Assignees

Labels

light-clientoptimizationSomething to make Lighthouse run more efficiently.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions