-
Notifications
You must be signed in to change notification settings - Fork 954
Bonsai archive feature #7475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bonsai archive feature #7475
Conversation
88f3968
to
7d4a524
Compare
782ae60
to
5752732
Compare
5b06b50
to
dce531e
Compare
Signed-off-by: Jason Frame <jason.frame@consensys.net>
…se constructor that reuses worldStateStorage so that we don't lose values in the EvmToolSpecTests Signed-off-by: Jason Frame <jason.frame@consensys.net>
Signed-off-by: Jason Frame <jason.frame@consensys.net>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
…d state, and freeze it Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
…ten for blocks and move account state to new DB segment Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
…t block state has been frozen for Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
…age from the freezer segment Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>
Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
* check for null Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com> * more nuanced change with help from #7475 Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com> * changelog Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com> --------- Signed-off-by: Sally MacFarlane <macfarla.github@gmail.com>
This pr is stale because it has been open for 30 days with no activity. |
This pr was closed because it has been inactive for 14 days since being marked as stale. |
17c2d8b
to
9079602
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added some small comments,
I will prefer to have only BONSAI with a archive flag, but maybe we can do that in another PR.
For me it's ok, I will wait for the feedback from @garyschulte
.getComposedWorldStateStorage() | ||
.startTransaction(); | ||
genesisStateTX.put( | ||
TRIE_BRANCH_STORAGE, WORLD_BLOCK_NUMBER_KEY, Bytes.ofUnsignedLong(0).toArrayUnsafe()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a risk that this value is initialized to zero for a node that was already fully synced before this PR and then restarted with the new code?
Maybe instead of checking if this specific field is empty, we should check if the node is already synced? That way, a previously synced node would just update the value during the next block import, avoiding incorrect initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the code currently supports starting an already-synced node with a new data storage format. If the node previously started with --data-storage-format=BONSAI
then starting it with X_BONSAI_ARCHIVE
would fail to start. So I believe this is safe for any node that is using the new storage format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about a node started with bonsai and restarting with Bonsai and this PR. it seems we are calling ensureGenesisArchiveContext also in bonsai mode . no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. I don't think we'd be calling writeAccountsTo()
on genesis state for an already-synced Bonsai node would we?
.map(MutableWorldState::disableTrie) | ||
.flatMap( | ||
worldState -> | ||
rollMutableArchiveStateToBlockHash( // This is a tiny action for archive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
accountFound = | ||
storage | ||
.getNearestBefore(ACCOUNT_INFO_STATE_ARCHIVE, keyNearest) | ||
.filter(found -> accountHash.commonPrefixLength(found.key()) >= accountHash.size()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can do that only one time accountHash.commonPrefixLength(found.key())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean generate accountHash.commonPrefixLength(found.key())
before calling .getNearestBefore
, so the code becomes something like:
int comPrefLength = accountHash.commonPrefixLength(found.key());
...
.filter(found -> comPrefLength >= accountHash.size())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realised that's not possible since found
doesn't exist until we're streaming the results. So I'm not quite sure what change you're suggesting?
.filter( | ||
found -> | ||
!Arrays.areEqual( | ||
DELETED_ACCOUNT_VALUE, found.value().orElse(DELETED_ACCOUNT_VALUE))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is important to keep the deleted value if we have ACCOUNT_INFO_STATE_ARCHIVE ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is important to keep the deleted value if we have ACCOUNT_INFO_STATE_ARCHIVE ?
No, I think that's a good point. I've updated the logic in a new commit so the deleted key filter is applied regardless of which DB table the entry is retrieved from.
Yeah I think there was some discussion about that early on in the archive work and IIRC Gary favoured a specific data storage format rather than a config option on BONSAI. |
Signed-off-by: Matthew Whitehead <matthew.whitehead@kaleido.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just waiting for Gary's approval also before merging
Signed-off-by: Matt Whitehead <matthew.whitehead@kaleido.io>
@garyschulte what do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really great. I left comments and questions regarding feature/function choices, but all of the changes seem to be compartmentalized to the new experimental archive storage mode, so we can address here or in a future PR IMO.
For the record, I think we need a separate storage format for archive, as you have it in this PR. The archive segments, metadata, and behavior are significantly different to BONSAI. If we don't differentiate between them as separate formats the potential to corrupt a very-expensive-to-sync archive is just one accidental config change away.
I have my misgivings about communicating the block context via a worldstate storage key (rather than treating it as an ephemeral parameter), BUT it seems the implications of that behavior are limited only to X_BONSAI_ARCHIVE. We can discover how this works testing operationally.
FWIW, I have a couple test nodes using this PR - a full sync bonsai node that is right around 12 mil, and an archive node started at the same time that is around 6 mil. There are probably opportunities for performance improvement in future PRs.
Thanks for keeping this PR updated and waiting for prague 🙏
@@ -118,6 +118,7 @@ public void run() { | |||
switch (dataStorageFormat) { | |||
case FOREST -> 1; | |||
case BONSAI -> 2; | |||
case X_BONSAI_ARCHIVE -> 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably ok, but we should never have a bonsai archive node with the prior metadata format, since the metadata change predates the bonsai archive feature. It might make sense to just throw here.
} else if (dataStorageConfiguration.getDataStorageFormat() | ||
== DataStorageFormat.X_BONSAI_ARCHIVE) { | ||
LOG.info("setting FlatDbStrategy to ARCHIVE"); | ||
transaction.put( | ||
TRIE_BRANCH_STORAGE, FLAT_DB_MODE, FlatDbMode.ARCHIVE.getVersion().toArrayUnsafe()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually want to do this? if we are transitioning from partial to archive, we implicitly do not have a real archive node. Perhaps there could be some value in running an archive node "since" a block height, but I would imagine an archive sync strategy would be a better fit than a snap sync strategy that transitions to archive when complete 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the archive + proofs PR #8669 does some refactoring in this area because I also wasn't totally happy with the way partial/full/archive were handled here. So for now I think I'm going to elect to leave it unchanged in this PR because I think the follow on PR much improves things.
// If there isn't a match look in the archive DB segment | ||
if (nearestAccount.isEmpty()) { | ||
accountFound = | ||
storage | ||
.getNearestBefore(ACCOUNT_INFO_STATE_ARCHIVE, keyNearest) | ||
.filter(found -> accountHash.commonPrefixLength(found.key()) >= accountHash.size()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are looking in the context of MAX_BLOCK_SUFFIX, we shouldn't have to defer to archive right? current/unchanged account values should not be moved to the archive unless something has gone wrong with archiving right? if we can skip this step we should see substantially faster "zero read" cases non-historical queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes agreed. I think calculateArchiveKeyWithMaxSuffix()
needs renaming as it may use MAX_BLOCK_SUFFIX
, but only if an archive context block isn't set. So I think some additional logic to check if we ended up defaulting to MAX_BLOCK_SUFFIX
and then skipping the archive lookup is what's needed (plus the possible function rename). If it's OK I'd like to start getting other people looking at the code, and tracking archive bugs etc. as standalone issues, so for this and the other comment I'm going to raise an issue and not fix in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, definitely reasonable to defer to a subsequent issue. I think not only can we skip archiving for MAX_BLOCK_SUFFIX, but also for block heights that are above the high water mark for being archived. If we expose the current archive block number via bonsai context, we can make that decision very cheaply, and get a significant boost for zero reads 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've raised #8850 to track this fix
// If there isn't a match look in the archive DB segment | ||
if (nearestStorage.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same thing as above comment regarding zero-read performance.
// TODO - rename calculateRootHash() to be clearer that it updates state, it doesn't just | ||
// calculate a hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is only half true, since an Optional.empty() state updater would mean no update. but yeah it is kinda side-effect-y
Signed-off-by: Matthew Whitehead <matthew.whitehead@kaleido.io>
Signed-off-by: Matt Whitehead <matthew.whitehead@kaleido.io>
Signed-off-by: Matthew Whitehead <matthew.whitehead@kaleido.io>
PR description
Introduces a new (experimental) "Bonsai Archive" DB mode which creates a full archive of the chain it syncs with. This allows JSON/RPC calls to be made with historic blocks as context, for example
eth_getBalance
to get the balance of an account at a historic block, oreth_call
to simulate a transaction at a given block in history.The PR is intended to provide part of the function currently offered by the (now deprecated)
FOREST
DB mode. Specifically it allows state to be queried at an arbitrary block in history, but does not currently offereth_getProof
for said state. A subsequent PR will implementeth_getProof
for historic blocks.Summary of the overall design & changes
This PR builds on PR #5865 which proved the basic concept of archiving state in the Bonsai flat DB by suffixing entries with the block in which they were changed.
For example the state for account
0x0e79065B5F11b5BD1e62B935A600976ffF3754B9
at block37834
is stored asIn order to minimise performance degradation over time, historic state and storage entries in the DB are "archived" by moving them into a separate DB segment.
Where account state is stored in segment
ACCOUNT_INFO_STATE
, state that has been archived is stored inACCOUNT_INFO_STATE_ARCHIVE
. Likewise where storage is held in segmentACCOUNT_STORAGE_STORAGE
, archived storage entries are stored inACCOUNT_STORAGE_ARCHIVE
.An example Rocks DB query to retrieve the state of the example account above would be:
Creating a Bonsai Archive node
The PR introduces an entirely new data storage format (as opposed to making it a configuration option of the existing
BONSAI
storage format.To create a bonsai archive node simply set
--data-storage-format=x_bonsai_archive
when creating it.An existing
FOREST
orBONSAI
node cannot be migrated toBONSAI_ARCHIVE
mode.Storage requirements
An archive node intrinsically requires more storage space than a non-archive node. Every state update is retained in the archive DB segments as outlined above. An archive node for the
holesky
testnet as of the raising of this PR requires approximately160Gi
of storage.Sync time
In order to create an archive of an entire chain,
FULL
sync mode must be used. This PR does not preventSNAP
syncing an archive node, but this will result in only a partial archive of the chain.While the node is performing a
FULL
sync with the chain it is also migrating entries from the regular DB segments to the archive DB segments. Overall this increases the time to create the archive node. For a public chain this might require 1 week or more to complete syncing and archiving.