Skip to content

Conversation

michaelsproul
Copy link
Member

Proposed Changes

Optimise fetch_blobs significantly, by fetching blobs from the EL prior to consensus and execution verification of the block.

We had noticed that we weren't getting many hits with fetch blobs, and this was because blobs were almost always arriving on gossip prior to us requesting them. Only a few times an hour would the fetch_blobs logic actually fire.

With this change I'm seeing much more frequent hits, without a substantial increase in publication bandwidth. In the last 30 mins running on mainnet there have been 116 hits, and 156 individual blobs published (out of 395 fetched).

Data here: https://docs.google.com/spreadsheets/d/1ZJIYbOPwNGa_veqUC0ywsOdzFYvh4aqJLMYqFoisA_E/edit?usp=sharing

This does imply that we're publishing around 35% of all blobs! But this will likely come down as more nodes chip in to publishing.

@michaelsproul michaelsproul added ready-for-review The code is ready for review optimization Something to make Lighthouse run more efficiently. labels Nov 21, 2024
@dapplion
Copy link
Collaborator

Any security considerations on triggering this logic before validating the block? The most damage a proposer can do is waste bandwidth on a bad proposal. This does not seem like a big issue and can be done anyway regardless of fetch_blobs.

Else the experimental results look great

@michaelsproul
Copy link
Member Author

We're doing this after gossip validation of the block, so we know that the proposer's signature is valid and they are a legitimate proposer for the slot.

Unless the proposer slashes themselves, the blob versioned hashes in the block header are the "true" (valid) versioned hashes for this slot. Alternatively the block could be completely invalid (but not slashable), in which case we will reject it upon completion of block processing.

As part of fetch_blobs we run gossip blob validation:

// Gossip verify blobs before publishing. This prevents blobs with invalid KZG proofs from
// the EL making it into the data availability checker. We do not immediately add these
// blobs to the observed blobs/columns cache because we want to allow blobs/columns to arrive on gossip
// and be accepted (and propagated) while we are waiting to publish. Just before publishing
// we will observe the blobs/columns and only proceed with publishing if they are not yet seen.
let blobs_to_import_and_publish = fixed_blob_sidecar_list
.iter()
.filter_map(|opt_blob| {
let blob = opt_blob.as_ref()?;
match GossipVerifiedBlob::<T, DoNotObserve>::new(blob.clone(), blob.index, &chain) {
Ok(verified) => Some(Ok(verified)),
// Ignore already seen blobs.
Err(GossipBlobError::RepeatBlob { .. }) => None,
Err(e) => Some(Err(e)),
}
})
.collect::<Result<Vec<_>, _>>()
.map_err(FetchEngineBlobError::GossipBlob)?;

So if they are malformed (e.g. bad KZG proof), they will be rejected at this point.

TL;DR on the whole I think it's security-equivalent to processing blobs on gossip:

  • We have a valid signature from the proposer committing to the blobs.
  • We verify the blobs themselves before importing them.

@michaelsproul michaelsproul added v6.0.0 New major release for hierarchical state diffs waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Nov 21, 2024
self.executor.spawn(
async move {
self_clone
.fetch_engine_blobs_and_publish(block_clone, block_root, publish_blobs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is running as a task, it's no longer bound by the beacon processor queue. Could someone spam gossip blocks and cause a lot of fetch blobs work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would need to be beacon blocks with valid signatures, and this is a linear factor, so it can't really blow up much beyond the number of threads allocated to the beacon processor. E.g. if we have 16 threads in the BP, we might end up with 32 running tasks max, which are mostly I/O bound and should be handled just fine by Tokio.

We do this in a few other places, like when we check the payload with the EL:

// Spawn the payload verification future as a new task, but don't wait for it to complete.
// The `payload_verification_future` will be awaited later to ensure verification completed
// successfully.
let payload_verification_handle = chain
.task_executor
.spawn_handle(
payload_verification_future,
"execution_payload_verification",
)
.ok_or(BeaconChainError::RuntimeShutdown)?;

@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Nov 22, 2024
michaelsproul added a commit that referenced this pull request Nov 25, 2024
Squashed commit of the following:

commit 5f563ef
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Fri Nov 22 12:33:10 2024 +1100

    Run fetch blobs in parallel with block import

commit 3cfe9df
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Thu Nov 21 10:46:34 2024 +1100

    Fetch blobs from EL prior to block verification
@michaelsproul michaelsproul mentioned this pull request Nov 25, 2024
michaelsproul added a commit that referenced this pull request Nov 25, 2024
Squashed commit of the following:

commit 5f563ef
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Fri Nov 22 12:33:10 2024 +1100

    Run fetch blobs in parallel with block import

commit 3cfe9df
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Thu Nov 21 10:46:34 2024 +1100

    Fetch blobs from EL prior to block verification
@michaelsproul michaelsproul added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Nov 27, 2024
@michaelsproul
Copy link
Member Author

@mergify queue

Copy link

mergify bot commented Nov 27, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Nov 27, 2024
@michaelsproul
Copy link
Member Author

@mergify requeue

Copy link

mergify bot commented Nov 27, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Nov 27, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Nov 27, 2024
michaelsproul added a commit that referenced this pull request Nov 27, 2024
Squashed commit of the following:

commit 5f563ef
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Fri Nov 22 12:33:10 2024 +1100

    Run fetch blobs in parallel with block import

commit 3cfe9df
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Thu Nov 21 10:46:34 2024 +1100

    Fetch blobs from EL prior to block verification
michaelsproul added a commit that referenced this pull request Nov 28, 2024
Squashed commit of the following:

commit 5f563ef
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Fri Nov 22 12:33:10 2024 +1100

    Run fetch blobs in parallel with block import

commit 3cfe9df
Author: Michael Sproul <michael@sigmaprime.io>
Date:   Thu Nov 21 10:46:34 2024 +1100

    Fetch blobs from EL prior to block verification
@michaelsproul
Copy link
Member Author

@mergify queue

Copy link

mergify bot commented Nov 28, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@michaelsproul
Copy link
Member Author

@mergify requeue

Copy link

mergify bot commented Nov 28, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Nov 28, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@michaelsproul
Copy link
Member Author

@mergify requeue

Copy link

mergify bot commented Nov 28, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Nov 28, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@michaelsproul
Copy link
Member Author

@mergify requeue

Copy link

mergify bot commented Nov 28, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Nov 28, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@michaelsproul
Copy link
Member Author

@mergify requeue

Copy link

mergify bot commented Nov 28, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Nov 28, 2024

queue

🛑 The pull request has been removed from the queue default

The merge conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@michaelsproul
Copy link
Member Author

@mergify requeue

Copy link

mergify bot commented Nov 29, 2024

requeue

❌ This pull request head commit has not been previously disembarked from queue.

@michaelsproul
Copy link
Member Author

@mergify queue

Copy link

mergify bot commented Nov 29, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 1c8161f

@mergify mergify bot merged commit 1c8161f into sigp:unstable Nov 29, 2024
29 checks passed
@michaelsproul
Copy link
Member Author

THANK FUCK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Something to make Lighthouse run more efficiently. ready-for-merge This PR is ready to merge. v6.0.0 New major release for hierarchical state diffs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants