-
Notifications
You must be signed in to change notification settings - Fork 898
Drop head tracker for summaries DAG #6744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@michaelsproul If we stop updating the head_tracker in the persisted head, how can we downgrade safely? If you run an old version of Lighthouse it will read the persisted head which contains an empty head tracker. |
Test
This is part of an assertion in the test calling This state is not pruned, but is the
Michael: figure out which of the hot_state_exists cases triggers, plus what state root and slot |
This pull request has merge conflicts. Could you please resolve them @dapplion? 🙏 |
c79ddb3
to
8c9a1b2
Compare
@michaelsproul I have
|
Done some testing using binary built with the commit 12fa5a8.
The execution client (Nethermind) is running normally, perhaps it's something to do with the DB?
The last line of the backtrace shows:
(I have uploaded the full backtrace error in the link shared with you previously.) When try to start agin with the PR binary, it failed:
If we change back to V6 binary, it produced the error:
We can see that it is starting checkpoint sync again (even though the database is there and I don't use the flag It sort of "ignores" the corrupted database and starts a fresh checkpoint sync on its own.
Last line of backtrace:
|
I've resolved conflicts with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging this down so we can simplify the conga line.
This has been running well on our infra.
Just going to do one more rollout and ensure logs are OK. |
Yep this is looking solid. Merging! 🎉 |
Having merged the drop-headtracker PR we now have a DB schema change in `unstable` compared to `release-v7.0.0`: - #6744 There is a DB downgrade available, however this needs to be applied manually and it's usually a bit of a hassle. This PR bumps the version on `unstable` to `v7.1.0-beta.0` _without_ actually cutting a `v7.1.0-beta.0` release, so that we can tell at a glance which schema version a node is using.
This PR implements #5978 (tree-states) but on the hot DB. It allows Lighthouse to massively reduce its disk footprint during non-finality and overall I/O in all cases. Closes #6580 Conga into #6744 ### TODOs - [x] Fix OOM in CI #7176 - [x] optimise store_hot_state to avoid storing a duplicate state if the summary already exists (should be safe from races now that pruning is cleaner) - [x] mispelled: get_ancenstor_state_root - [x] get_ancestor_state_root should use state summaries - [x] Prevent split from changing during ancestor calc - [x] Use same hierarchy for hot and cold ### TODO Good optimization for future PRs - [ ] On the migration, if the latest hot snapshot is aligned with the cold snapshot migrate the diffs instead of the full states. ``` align slot time 10485760 Nov-26-2024 12582912 Sep-14-2025 14680064 Jul-02-2026 ``` ### TODO Maybe things good to have - [ ] Rename anchor_slot https://github.com/sigp/lighthouse/compare/tree-states-hot-rebase-oom...dapplion:lighthouse:tree-states-hot-anchor-slot-rename?expand=1 - [ ] Make anchor fields not public such that they must be mutated through a method. To prevent un-wanted changes of the anchor_slot ### NOTTODO - [ ] Use fork-choice and a new method [`descendants_of_checkpoint`](ca2388e#diff-046fbdb517ca16b80e4464c2c824cf001a74a0a94ac0065e635768ac391062a8) to filter only the state summaries that descend of finalized checkpoint]
Issue Addressed
The head tracker is a persisted piece of state that must be kept in sync with the fork-choice. It has been a source of pruning issues in the past, so we want to remove it
When implementing tree-states in the hot DB we have to change the pruning routine (more details below) so we want to do those changes first in isolation.
Closes #1785
Proposed Changes
Current DB migration routine
In summary, it computes what it has to delete and keeps the rest. Then it migrates data to the freezer. If the abandoned forks routine has a bug it can break the freezer migration.
Proposed migration routine (this PR)
This strategy is more sound as it just checks what's there in the hot DB, computes what it has to keep and deletes the rest. Because it does not rely and 3rd pieces of data we can drop the head tracker and pruning checkpoint. Since the DB migration happens first now, as long as the computation of the sets to keep is correct we won't have pruning issues.