[Merged by Bors] - Add configurable block replayer #2863

michaelsproul · 2021-12-14T06:52:24Z

Issue Addressed

Successor to #2431

Proposed Changes

Add a BlockReplayer struct to abstract over the intricacies of calling per_slot_processing and per_block_processing while avoiding unnecessary tree hashing.
Add a variant of the forwards state root iterator that does not require an end_state.
Use the BlockReplayer when reconstructing states in the database. Use the efficient forwards iterator for frozen states.
Refactor the iterators to remove Arc<HotColdDB> (this seems to be neater than making everything an Arc<HotColdDB> as I did in [WIP] Accelerate loading of frozen states #2431).

Supplying the state roots allow us to avoid building a tree hash cache at all when reconstructing historic states, which saves around 1 second flat (regardless of slots-per-restore-point). This is a small percentage of worst-case state load times with 200K validators and SPRP=2048 (~15s vs ~16s) but a significant speed-up for more frequent restore points: state loads with SPRP=32 should be now consistently <500ms instead of 1.5s (a ~3x speedup).

Additional Info

Required by #2628

consensus/state_processing/src/block_replayer.rs

michaelsproul · 2021-12-15T03:54:13Z

Blocked on #2865

paulhauner

I love the BlockReplayer, it's such a good idea! 🎉

I've left some comments, but they probably amount to fearful Morty saying "I-I dunno Rick..." when Rick has shit totally sorted.

paulhauner · 2021-12-15T07:45:16Z

beacon_node/beacon_chain/src/beacon_chain.rs

+        start_slot: Slot,
+        end_slot: Slot,
+    ) -> Result<impl Iterator<Item = Result<(Hash256, Slot), Error>> + '_, Error> {
+        self.with_head(move |head| {


I notice this holds a read-lock on the head until the iteration completes. I don't have an idea on how long freezer roots iteration takes, I'm wondering what your thoughts are on hogging the head lock?

Ah, good catch.

It's ok in the case where the iteration starts in the frozen root section of the DB, because we clone the head and return the iterator (which is lazy) here:

lighthouse/beacon_node/store/src/forwards_iter.rs

Lines 285 to 294 in ec830f1

let continuation_data =

if end_slot.map_or(false, |end_slot| end_slot < latest_restore_point_slot) {

Box::new(None)

} else {

Box::new(Some(get_state()))

};

PreFinalization {

iter,

continuation_data,

}

However in the case where we start after the last_restore_point_slot we eagerly iterate backwards from the head while holding the lock, which is not ideal:

lighthouse/beacon_node/store/src/forwards_iter.rs

Lines 296 to 304 in ec830f1

let (end_state, end_state_root) = get_state();

PostFinalization {

iter: SimpleForwardsStateRootsIterator::new(

store,

start_slot,

end_state,

end_state_root,

)?,

}

I'll add a variant to the hybrid iterator that represents the "between state" of just holding the state before we start iterating backwards. Then we can preserve the property that the iterator is always quick to construct

I've implemented this in b892bb7 and ended up consolidating the block root and state root iterators while I was there.

paulhauner · 2021-12-15T07:54:40Z

consensus/state_processing/src/block_replayer.rs

+            state,
+            spec,
+            state_root_strategy: StateRootStrategy::Accurate,
+            block_sig_strategy: BlockSignatureStrategy::NoVerification,


I wonder if we should default to signature verification enabled? I can see that most use-cases would involve no verification, but perhaps there's long-term safety in verification by default.

Done! The defaults are now verify-everything, and we can opt in to turning things off as necessary

consensus/state_processing/src/block_replayer.rs

paulhauner · 2021-12-15T08:02:59Z

consensus/state_processing/src/block_replayer.rs

+    ) -> Result<Self, Error> {
+        for (i, block) in blocks.iter().enumerate() {
+            if block.slot() <= self.state.slot() {
+                continue;


I would have expected this to raise an error, indicating that the blocks supplied are not valid. It's not a big deal at all, I'm just interested if it's expected to supply out-of-range blocks in some scenario.

This was copied over from the old block replay logic in the database, where we load one extra block at the start so that we can read its state root

We could try limiting the number of lead-up blocks to just 1, and error otherwise, but I think it's also acceptable to keep it as-is?

I've added a check for i == 0 so that we'll error if more than 1 lead-up block is supplied

consensus/state_processing/src/block_replayer.rs

paulhauner · 2021-12-15T08:05:54Z

consensus/state_processing/src/block_replayer.rs

+                }
+
+                let state_root = self.get_state_root(self.state.slot(), &blocks, i)?;
+                per_slot_processing(&mut self.state, state_root, self.spec)


Perhaps capturing the Option<EpochProcessingSummary> would be useful for the sort of things @macladson is doing.

Ah yes! Very good idea! I'll store them in a HashMap<Epoch, EpochProcessingSummary> on the BlockReplayer.

Might also add a config option that avoids storing them.

I ended up feeding them to the post_slot_hook, which can take care of storing them if it desires (or better yet: aggregate them as it receives them)

paulhauner · 2021-12-15T08:07:22Z

consensus/state_processing/src/block_replayer.rs

+                block,
+                None,
+                self.block_sig_strategy,
+                VerifyBlockRoot::False,


Is skipping block root verification just to save cycles on the tree hashing? I'm wondering if it's worth at least checking it on the first block. It's generally without single-digit milliseconds in my experience, perhaps the guarantee on consistency is worth it?

This is mostly for the inconsistent state root case, but I figured I'd leave it off in general for speed (1ms repeated over 2048 slots is still ~2 seconds). I like the idea of checking it for the first block though, so I'll change it to:

let verify_block_root = if i == 0 && self.state_root_strategy == StateRootStrategy::Accurate { VerifyBlockRoot::True } else { VerifyBlockRoot::False };

Block root verification is now on by default, and one can opt in to the "check just the first" behaviour by calling minimal_block_root_verification

beacon_node/store/src/hot_cold_store.rs

michaelsproul · 2021-12-17T01:18:16Z

Sorry @paulhauner I've pushed a bunch of new stuff:

The latest commit (1ac420b) adds a lazy forwards block root iterator for Mac to use, and changes the cloning to avoid the tree hash cache.
This commit makes the iterators properly lazy: b892bb7
This commit sets the defaults to safe values: b892bb7

I can't link you a nice unified diff because there's a merge commit smack bang in the middle. If that's annoying let me know and I can rebase

paulhauner

Very nice! Merge at will!

michaelsproul · 2021-12-21T06:20:10Z

Gracias amigo!

bors r+

## Issue Addressed Successor to #2431 ## Proposed Changes * Add a `BlockReplayer` struct to abstract over the intricacies of calling `per_slot_processing` and `per_block_processing` while avoiding unnecessary tree hashing. * Add a variant of the forwards state root iterator that does not require an `end_state`. * Use the `BlockReplayer` when reconstructing states in the database. Use the efficient forwards iterator for frozen states. * Refactor the iterators to remove `Arc<HotColdDB>` (this seems to be neater than making _everything_ an `Arc<HotColdDB>` as I did in #2431). Supplying the state roots allow us to avoid building a tree hash cache at all when reconstructing historic states, which saves around 1 second flat (regardless of `slots-per-restore-point`). This is a small percentage of worst-case state load times with 200K validators and SPRP=2048 (~15s vs ~16s) but a significant speed-up for more frequent restore points: state loads with SPRP=32 should be now consistently <500ms instead of 1.5s (a ~3x speedup). ## Additional Info Required by #2628

bors · 2021-12-21T06:30:39Z

Build failed (retrying...):

clippy

## Issue Addressed Successor to #2431 ## Proposed Changes * Add a `BlockReplayer` struct to abstract over the intricacies of calling `per_slot_processing` and `per_block_processing` while avoiding unnecessary tree hashing. * Add a variant of the forwards state root iterator that does not require an `end_state`. * Use the `BlockReplayer` when reconstructing states in the database. Use the efficient forwards iterator for frozen states. * Refactor the iterators to remove `Arc<HotColdDB>` (this seems to be neater than making _everything_ an `Arc<HotColdDB>` as I did in #2431). Supplying the state roots allow us to avoid building a tree hash cache at all when reconstructing historic states, which saves around 1 second flat (regardless of `slots-per-restore-point`). This is a small percentage of worst-case state load times with 200K validators and SPRP=2048 (~15s vs ~16s) but a significant speed-up for more frequent restore points: state loads with SPRP=32 should be now consistently <500ms instead of 1.5s (a ~3x speedup). ## Additional Info Required by #2628

bors · 2021-12-21T08:23:09Z

Pull request successfully merged into unstable.

Build succeeded:

## Issue Addressed N/A ## Proposed Changes Add a HTTP API which can be used to compute the block packing data for all blocks over a discrete range of epochs. ## Usage ### Request ``` curl "http:localhost:5052/lighthouse/analysis/block_packing_efficiency?start_epoch=57730&end_epoch=57732" ``` ### Response ``` [ { "slot": "1847360", "block_hash": "0xa7dc230659802df2f99ea3798faede2e75942bb5735d56e6bfdc2df335dcd61f", "proposer_info": { "validator_index": 1686, "graffiti": "" }, "available_attestations": 7096, "included_attestations": 6459, "prior_skip_slots": 0 }, ... ] ``` ## Additional Info This is notably different to the existing lcli code: - Uses `BlockReplayer` #2863 and as such runs significantly faster than the previous method. - Corrects the off-by-one #2878 - Removes the `offline` validators component. This was only a "best guess" and simply was used as a way to determine an estimate of the "true" packing efficiency and was generally not helpful in terms of direct comparisons between different packing methods. As such it has been removed from the API and any future estimates of "offline" validators would be better suited in a separate/more targeted API or as part of 'beacon watch': #2873 - Includes `prior_skip_slots`.

michaelsproul added the work-in-progress PR is a work-in-progress label Dec 14, 2021

michaelsproul commented Dec 14, 2021

View reviewed changes

consensus/state_processing/src/block_replayer.rs Show resolved Hide resolved

michaelsproul mentioned this pull request Dec 14, 2021

Reduce frequency of full state storage #2833

Closed

Implement BlockReplayer

d499a4f

michaelsproul force-pushed the block-replayer branch from 7038671 to d499a4f Compare December 14, 2021 23:16

michaelsproul added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Dec 14, 2021

michaelsproul requested a review from macladson December 14, 2021 23:16

Add skip status to post_slot_hook

ec830f1

michaelsproul added the blocked label Dec 15, 2021

paulhauner reviewed Dec 15, 2021

View reviewed changes

paulhauner added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Dec 15, 2021

michaelsproul added 4 commits December 15, 2021 23:07

Add test for block replayer hooks

c89f449

Consolidate forwards iterators and make them lazier

b892bb7

Merge remote-tracking branch 'origin/unstable' into block-replayer

6ef47c1

Address review comments: safe defaults

6c0f567

michaelsproul added ready-for-review The code is ready for review and removed blocked waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Dec 16, 2021

michaelsproul mentioned this pull request Dec 16, 2021

[Merged by Bors] - Implement API for block rewards #2628

Closed

Add lazy block root iterator, avoid full clone

1ac420b

paulhauner approved these changes Dec 21, 2021

View reviewed changes

paulhauner added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Dec 21, 2021

bors bot changed the title ~~Add configurable block replayer~~ [Merged by Bors] - Add configurable block replayer Dec 21, 2021

bors bot closed this Dec 21, 2021

michaelsproul deleted the block-replayer branch December 21, 2021 09:18

macladson mentioned this pull request Dec 23, 2021

[Merged by Bors] - Add API to compute block packing efficiency data #2879

Closed

	let continuation_data =
	if end_slot.map_or(false, \|end_slot\| end_slot < latest_restore_point_slot) {
	Box::new(None)
	} else {
	Box::new(Some(get_state()))
	};
	PreFinalization {
	iter,
	continuation_data,
	}

	let (end_state, end_state_root) = get_state();
	PostFinalization {
	iter: SimpleForwardsStateRootsIterator::new(
	store,
	start_slot,
	end_state,
	end_state_root,
	)?,
	}

[Merged by Bors] - Add configurable block replayer #2863

[Merged by Bors] - Add configurable block replayer #2863

Uh oh!

Conversation

michaelsproul commented Dec 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Addressed

Proposed Changes

Additional Info

Uh oh!

Uh oh!

michaelsproul commented Dec 15, 2021

Uh oh!

paulhauner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelsproul Dec 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

michaelsproul commented Dec 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paulhauner left a comment

Choose a reason for hiding this comment

Uh oh!

michaelsproul commented Dec 21, 2021

Uh oh!

bors bot commented Dec 21, 2021

Uh oh!

bors bot commented Dec 21, 2021

Uh oh!

Uh oh!

michaelsproul commented Dec 14, 2021 •

edited

Loading

michaelsproul Dec 15, 2021 •

edited

Loading

michaelsproul commented Dec 17, 2021 •

edited

Loading