Skip to content

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Dec 14, 2021

Issue Addressed

Successor to #2431

Proposed Changes

  • Add a BlockReplayer struct to abstract over the intricacies of calling per_slot_processing and per_block_processing while avoiding unnecessary tree hashing.
  • Add a variant of the forwards state root iterator that does not require an end_state.
  • Use the BlockReplayer when reconstructing states in the database. Use the efficient forwards iterator for frozen states.
  • Refactor the iterators to remove Arc<HotColdDB> (this seems to be neater than making everything an Arc<HotColdDB> as I did in [WIP] Accelerate loading of frozen states #2431).

Supplying the state roots allow us to avoid building a tree hash cache at all when reconstructing historic states, which saves around 1 second flat (regardless of slots-per-restore-point). This is a small percentage of worst-case state load times with 200K validators and SPRP=2048 (~15s vs ~16s) but a significant speed-up for more frequent restore points: state loads with SPRP=32 should be now consistently <500ms instead of 1.5s (a ~3x speedup).

Additional Info

Required by #2628

@michaelsproul michaelsproul added the work-in-progress PR is a work-in-progress label Dec 14, 2021
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed work-in-progress PR is a work-in-progress labels Dec 14, 2021
@michaelsproul
Copy link
Member Author

Blocked on #2865

Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the BlockReplayer, it's such a good idea! 🎉

I've left some comments, but they probably amount to fearful Morty saying "I-I dunno Rick..." when Rick has shit totally sorted.

start_slot: Slot,
end_slot: Slot,
) -> Result<impl Iterator<Item = Result<(Hash256, Slot), Error>> + '_, Error> {
self.with_head(move |head| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice this holds a read-lock on the head until the iteration completes. I don't have an idea on how long freezer roots iteration takes, I'm wondering what your thoughts are on hogging the head lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch.

It's ok in the case where the iteration starts in the frozen root section of the DB, because we clone the head and return the iterator (which is lazy) here:

let continuation_data =
if end_slot.map_or(false, |end_slot| end_slot < latest_restore_point_slot) {
Box::new(None)
} else {
Box::new(Some(get_state()))
};
PreFinalization {
iter,
continuation_data,
}

However in the case where we start after the last_restore_point_slot we eagerly iterate backwards from the head while holding the lock, which is not ideal:

let (end_state, end_state_root) = get_state();
PostFinalization {
iter: SimpleForwardsStateRootsIterator::new(
store,
start_slot,
end_state,
end_state_root,
)?,
}

I'll add a variant to the hybrid iterator that represents the "between state" of just holding the state before we start iterating backwards. Then we can preserve the property that the iterator is always quick to construct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this in b892bb7 and ended up consolidating the block root and state root iterators while I was there.

state,
spec,
state_root_strategy: StateRootStrategy::Accurate,
block_sig_strategy: BlockSignatureStrategy::NoVerification,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should default to signature verification enabled? I can see that most use-cases would involve no verification, but perhaps there's long-term safety in verification by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! The defaults are now verify-everything, and we can opt in to turning things off as necessary

) -> Result<Self, Error> {
for (i, block) in blocks.iter().enumerate() {
if block.slot() <= self.state.slot() {
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected this to raise an error, indicating that the blocks supplied are not valid. It's not a big deal at all, I'm just interested if it's expected to supply out-of-range blocks in some scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied over from the old block replay logic in the database, where we load one extra block at the start so that we can read its state root

We could try limiting the number of lead-up blocks to just 1, and error otherwise, but I think it's also acceptable to keep it as-is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a check for i == 0 so that we'll error if more than 1 lead-up block is supplied

}

let state_root = self.get_state_root(self.state.slot(), &blocks, i)?;
per_slot_processing(&mut self.state, state_root, self.spec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps capturing the Option<EpochProcessingSummary> would be useful for the sort of things @macladson is doing.

Copy link
Member Author

@michaelsproul michaelsproul Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! Very good idea! I'll store them in a HashMap<Epoch, EpochProcessingSummary> on the BlockReplayer.

Might also add a config option that avoids storing them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up feeding them to the post_slot_hook, which can take care of storing them if it desires (or better yet: aggregate them as it receives them)

block,
None,
self.block_sig_strategy,
VerifyBlockRoot::False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is skipping block root verification just to save cycles on the tree hashing? I'm wondering if it's worth at least checking it on the first block. It's generally without single-digit milliseconds in my experience, perhaps the guarantee on consistency is worth it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly for the inconsistent state root case, but I figured I'd leave it off in general for speed (1ms repeated over 2048 slots is still ~2 seconds). I like the idea of checking it for the first block though, so I'll change it to:

let verify_block_root = if i == 0 && self.state_root_strategy == StateRootStrategy::Accurate {
    VerifyBlockRoot::True
} else {
    VerifyBlockRoot::False
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Block root verification is now on by default, and one can opt in to the "check just the first" behaviour by calling minimal_block_root_verification

@paulhauner paulhauner added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Dec 15, 2021
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed blocked waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels Dec 16, 2021
@michaelsproul
Copy link
Member Author

michaelsproul commented Dec 17, 2021

Sorry @paulhauner I've pushed a bunch of new stuff:

  • The latest commit (1ac420b) adds a lazy forwards block root iterator for Mac to use, and changes the cloning to avoid the tree hash cache.
  • This commit makes the iterators properly lazy: b892bb7
  • This commit sets the defaults to safe values: b892bb7

I can't link you a nice unified diff because there's a merge commit smack bang in the middle. If that's annoying let me know and I can rebase

Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Merge at will!

@paulhauner paulhauner added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Dec 21, 2021
@michaelsproul
Copy link
Member Author

Gracias amigo!

bors r+

bors bot pushed a commit that referenced this pull request Dec 21, 2021
## Issue Addressed

Successor to #2431

## Proposed Changes

* Add a `BlockReplayer` struct to abstract over the intricacies of calling `per_slot_processing` and `per_block_processing` while avoiding unnecessary tree hashing.
* Add a variant of the forwards state root iterator that does not require an `end_state`.
* Use the `BlockReplayer` when reconstructing states in the database. Use the efficient forwards iterator for frozen states.
* Refactor the iterators to remove `Arc<HotColdDB>` (this seems to be neater than making _everything_ an `Arc<HotColdDB>` as I did in #2431).

Supplying the state roots allow us to avoid building a tree hash cache at all when reconstructing historic states, which saves around 1 second flat (regardless of `slots-per-restore-point`). This is a small percentage of worst-case state load times with 200K validators and SPRP=2048 (~15s vs ~16s) but a significant speed-up for more frequent restore points: state loads with SPRP=32 should be now consistently <500ms instead of 1.5s (a ~3x speedup).

## Additional Info

Required by #2628
@bors
Copy link

bors bot commented Dec 21, 2021

Build failed (retrying...):

bors bot pushed a commit that referenced this pull request Dec 21, 2021
## Issue Addressed

Successor to #2431

## Proposed Changes

* Add a `BlockReplayer` struct to abstract over the intricacies of calling `per_slot_processing` and `per_block_processing` while avoiding unnecessary tree hashing.
* Add a variant of the forwards state root iterator that does not require an `end_state`.
* Use the `BlockReplayer` when reconstructing states in the database. Use the efficient forwards iterator for frozen states.
* Refactor the iterators to remove `Arc<HotColdDB>` (this seems to be neater than making _everything_ an `Arc<HotColdDB>` as I did in #2431).

Supplying the state roots allow us to avoid building a tree hash cache at all when reconstructing historic states, which saves around 1 second flat (regardless of `slots-per-restore-point`). This is a small percentage of worst-case state load times with 200K validators and SPRP=2048 (~15s vs ~16s) but a significant speed-up for more frequent restore points: state loads with SPRP=32 should be now consistently <500ms instead of 1.5s (a ~3x speedup).

## Additional Info

Required by #2628
@bors
Copy link

bors bot commented Dec 21, 2021

@bors bors bot changed the title Add configurable block replayer [Merged by Bors] - Add configurable block replayer Dec 21, 2021
@bors bors bot closed this Dec 21, 2021
@michaelsproul michaelsproul deleted the block-replayer branch December 21, 2021 09:18
bors bot pushed a commit that referenced this pull request Feb 18, 2022
## Issue Addressed
N/A

## Proposed Changes
Add a HTTP API which can be used to compute the block packing data for all blocks over a discrete range of epochs.

## Usage
### Request
```
curl "http:localhost:5052/lighthouse/analysis/block_packing_efficiency?start_epoch=57730&end_epoch=57732"
```
### Response
```
[
  {
    "slot": "1847360",
    "block_hash": "0xa7dc230659802df2f99ea3798faede2e75942bb5735d56e6bfdc2df335dcd61f",
    "proposer_info": {
      "validator_index": 1686,
      "graffiti": ""
    },
    "available_attestations": 7096,
    "included_attestations": 6459,
    "prior_skip_slots": 0
  },
  ...
]
```
## Additional Info

This is notably different to the existing lcli code:
- Uses `BlockReplayer` #2863 and as such runs significantly faster than the previous method.
- Corrects the off-by-one #2878
- Removes the `offline` validators component. This was only a "best guess" and simply was used as a way to determine an estimate of the "true" packing efficiency and was generally not helpful in terms of direct comparisons between different packing methods. As such it has been removed from the API and any future estimates of "offline" validators would be better suited in a separate/more targeted API or as part of 'beacon watch': #2873 
- Includes `prior_skip_slots`.
bors bot pushed a commit that referenced this pull request Feb 21, 2022
## Issue Addressed
N/A

## Proposed Changes
Add a HTTP API which can be used to compute the block packing data for all blocks over a discrete range of epochs.

## Usage
### Request
```
curl "http:localhost:5052/lighthouse/analysis/block_packing_efficiency?start_epoch=57730&end_epoch=57732"
```
### Response
```
[
  {
    "slot": "1847360",
    "block_hash": "0xa7dc230659802df2f99ea3798faede2e75942bb5735d56e6bfdc2df335dcd61f",
    "proposer_info": {
      "validator_index": 1686,
      "graffiti": ""
    },
    "available_attestations": 7096,
    "included_attestations": 6459,
    "prior_skip_slots": 0
  },
  ...
]
```
## Additional Info

This is notably different to the existing lcli code:
- Uses `BlockReplayer` #2863 and as such runs significantly faster than the previous method.
- Corrects the off-by-one #2878
- Removes the `offline` validators component. This was only a "best guess" and simply was used as a way to determine an estimate of the "true" packing efficiency and was generally not helpful in terms of direct comparisons between different packing methods. As such it has been removed from the API and any future estimates of "offline" validators would be better suited in a separate/more targeted API or as part of 'beacon watch': #2873 
- Includes `prior_skip_slots`.
bors bot pushed a commit that referenced this pull request Feb 21, 2022
## Issue Addressed
N/A

## Proposed Changes
Add a HTTP API which can be used to compute the block packing data for all blocks over a discrete range of epochs.

## Usage
### Request
```
curl "http:localhost:5052/lighthouse/analysis/block_packing_efficiency?start_epoch=57730&end_epoch=57732"
```
### Response
```
[
  {
    "slot": "1847360",
    "block_hash": "0xa7dc230659802df2f99ea3798faede2e75942bb5735d56e6bfdc2df335dcd61f",
    "proposer_info": {
      "validator_index": 1686,
      "graffiti": ""
    },
    "available_attestations": 7096,
    "included_attestations": 6459,
    "prior_skip_slots": 0
  },
  ...
]
```
## Additional Info

This is notably different to the existing lcli code:
- Uses `BlockReplayer` #2863 and as such runs significantly faster than the previous method.
- Corrects the off-by-one #2878
- Removes the `offline` validators component. This was only a "best guess" and simply was used as a way to determine an estimate of the "true" packing efficiency and was generally not helpful in terms of direct comparisons between different packing methods. As such it has been removed from the API and any future estimates of "offline" validators would be better suited in a separate/more targeted API or as part of 'beacon watch': #2873 
- Includes `prior_skip_slots`.
bors bot pushed a commit that referenced this pull request Feb 21, 2022
## Issue Addressed
N/A

## Proposed Changes
Add a HTTP API which can be used to compute the block packing data for all blocks over a discrete range of epochs.

## Usage
### Request
```
curl "http:localhost:5052/lighthouse/analysis/block_packing_efficiency?start_epoch=57730&end_epoch=57732"
```
### Response
```
[
  {
    "slot": "1847360",
    "block_hash": "0xa7dc230659802df2f99ea3798faede2e75942bb5735d56e6bfdc2df335dcd61f",
    "proposer_info": {
      "validator_index": 1686,
      "graffiti": ""
    },
    "available_attestations": 7096,
    "included_attestations": 6459,
    "prior_skip_slots": 0
  },
  ...
]
```
## Additional Info

This is notably different to the existing lcli code:
- Uses `BlockReplayer` #2863 and as such runs significantly faster than the previous method.
- Corrects the off-by-one #2878
- Removes the `offline` validators component. This was only a "best guess" and simply was used as a way to determine an estimate of the "true" packing efficiency and was generally not helpful in terms of direct comparisons between different packing methods. As such it has been removed from the API and any future estimates of "offline" validators would be better suited in a separate/more targeted API or as part of 'beacon watch': #2873 
- Includes `prior_skip_slots`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants