Skip to content

Same block getting imported multiple times due to race condition #6439

@jimmygchen

Description

@jimmygchen

Description

TL;DR: This is actually an existing race condition and can be triggered today. This can happen when blobs arrived from RPC and gossip around the same time and both triggering a block import. There's no impact other than extra work performed: in the 2nd import, the state cache is expected to miss, as the previous import pop'd the state, so this will result in state reconstruction, which takes ~80ms.

PeerDAS occurence

On peer-das-devnet-2, I'm seeing the same block getting written to the database multiple times (4 times within 2 ms!). This occurred on a supernode.

I think this is because we keep the PendingComponents in the cache until we complete block import, so for each gossip component that made it through to the DA checker after the block is available, it would trigger another import:

// Remove block components from da_checker AFTER completing block import. Then we can assert
// the following invariant:
// > A valid unfinalized block is either in fork-choice or da_checker.
//
// If we remove the block when it becomes available, there's some time window during
// `import_block` where the block is nowhere. Consumers of the da_checker can handle the
// extend time a block may exist in the da_checker.
//
// If `import_block` errors (only errors with internal errors), the pending components will
// be pruned on data_availability_checker maintenance as finality advances.
self.data_availability_checker
.remove_pending_components(block_root);

With PeerDAS supernodes, once we receive 64 columns, the remaining columns can come from reconstruction and block can be made available after this. However, we can still get columns from gossip (as they haven't been seen via gossip), and each of these gossip columns could trigger a block import, as the PendingComponent in the cache is "complete".

The change to keep PendingComponent in the DA cache was intentional and was made in the following PR to address an issue with sync lookup:
#5845

I'm not sure if there's a better way, but if we do need to keep the block in the DA Checker during import, perhaps we can do a check before trying to process a gossip block/blob/data_column - if block has already been made available, just return instead of processing and re-importing.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions