-
Notifications
You must be signed in to change notification settings - Fork 37.7k
rpc: Optimize serialization and enhance metadata of dumptxoutset output #29612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
714b3dd
to
8d67b35
Compare
8d67b35
to
b0cfbce
Compare
Well, I'm not on holiday :-) tACK b0cfbce I do suggest adding a comment to
I generated a snapshot and loaded it using the patch in #29551 (only waited for it to reach the tip, not a full background sync). For those testing who don't want to wait out the long flush, see #28358. |
33aa19d
to
97eb214
Compare
Thanks for the review and testing!
As it is worded, wouldn't this be more appropriate to add in I have now amended the comment a bit and added a version of it to both |
97eb214
to
923ec90
Compare
re-ACK 923ec90 That looks good. The reason I suggested (also) writing something in the header is because it's more likely to be noticed by some future dev looking through existing leveldb uses, seeing if they can replace it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, ACK 923ec90
Just some comment nits that could easily be done in a follow-up too.
Coin coin; | ||
unsigned int iter{0}; | ||
std::vector<std::pair<uint32_t, Coin>> coins; | ||
|
||
// To reduce space the serialization format of the snapshot avoids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Might it be useful to add a small diagram of the format to the dumptxoutset
help output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will think about what that could look like if I have to retouch or in the follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time thinking about a diagram that would work but I wasn't very happy what I could come up with. I instead copied the expression used in the issue. It shows the format in a concise way that I find understandable: list(Txid, list((vout,Coin))
. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also add the size of the coins list list(Txid, number_of_coins, list((vout,Coin)))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add this if I have to retouch, but if not I can do this as a follow-up in one of my other assumeutxo PRs.
843b95e
to
71d41d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-ACK 71d41d5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, just some small comments, will review again quickly once they are addressed.
Co-authored-by: Aurèle Oulès <aurele@oules.com> Co-authored-by: TheCharlatan <seb.kung@gmail.com>
The following data is added: - A newly introduced utxo set magic - A version number - The network magic - The block height
b2cf3f7
to
542e13b
Compare
Addressed feedback from @TheCharlatan and @theStack , thank you! Also changed the error code for the parsing of the metadata which was feedback in another PR: #28670 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-ACK 542e13b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 542e13b
@@ -2695,24 +2696,48 @@ UniValue CreateUTXOSnapshot( | |||
afile << metadata; | |||
|
|||
COutPoint key; | |||
Txid last_hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would still prefer last_txid
over last_hash
as it's more expressive (OTOH, in COutPoint
, it's also called hash
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I must have overlooked that if you already mentioned it before when I changed the type. I will address it if I have to retouch.
ACK 542e13b |
inline void Serialize(Stream& s) const { | ||
s << SNAPSHOT_MAGIC_BYTES; | ||
s << m_version; | ||
s << Params().MessageStart(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It would be nice, if new code didn't use globals implicitly. What about storing a reference to the prams (or the magic) in a member field of this struct and using it here? An alternative would be to use params-serialization and embed the magic bytes during serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed this in #30167 using the member field. Using params-serialization wouldn't have made much of a difference given how we use this code right now, I think.
@@ -5735,7 +5747,8 @@ bool ChainstateManager::PopulateAndValidateSnapshot( | |||
|
|||
bool out_of_coins{false}; | |||
try { | |||
coins_file >> outpoint; | |||
Txid txid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this should check that the stream is completely empty, no? I suspect that a trailing byte will be left undetected, so it may be better to deserialize a std::byte
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in #30167
coins_per_txid = ReadCompactSize(coins_file); | ||
|
||
if (coins_per_txid > coins_left) { | ||
LogPrintf("[snapshot] mismatch in coins count in snapshot metadata and actual snapshot data\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is new code, but uncovered. It would be nice to have a test for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test for this in #30167
Added release notes and dddressed post-merge comments in #30167 |
efc1b5b test: Add coverage for txid coins count check when loading snapshot (Fabian Jahr) 6b60848 assumeutxo: Add network magic ctor param to SnapshotMetadata (Fabian Jahr) 1f1f998 assumeutxo: Deserialize trailing byte instead of Txid (Fabian Jahr) 359967e doc: Add release notes for #29612 (Fabian Jahr) Pull request description: This adds release notes for #29612 and addresses post-merge review comments. ACKs for top commit: maflcko: utACK efc1b5b theStack: utACK efc1b5b Tree-SHA512: 3b270202e4f7b2576090ef1d970fd54a6840d96fc3621dddd28e888fb8696a97ff69af2e000bcee3b364316ca3f6e2a9b2f1694c6184f0e704dc487823127ce4
Summary: Co-authored-by: Aurèle Oulès <aurele@oules.com> Co-authored-by: TheCharlatan <seb.kung@gmail.com> This is a partial backport of [[bitcoin/bitcoin#29612 | core#29612]] bitcoin/bitcoin@de95953 bitcoin/bitcoin@c14ed7f bitcoin/bitcoin@4d8e5ed Test Plan: `ninja all check-all` add a assume utxo checkpoint, dump the corresponding utxo set, start a fresh node, load the utxo set and let it fully validate Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D17985
Summary: The following data is added: - A newly introduced utxo set magic - A version number - The network magic - The block height This concludes backport of [[bitcoin/bitcoin#29612 | core#29612]] bitcoin/bitcoin@542e13b Depends on D17989 Test Plan: `ninja all check-all` Dump an assumeutxo snapshot on a fully synced node, load it on a fresh new node, let it fully validate the chain Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Subscribers: Fabien Differential Revision: https://reviews.bitcoinabc.org/D17994
The second attempt at implementing the
dumptxoutset
space optimization as suggested in #25675. Closes #25675.This builds on the work done in #26045, addresses open feedback, adds some further improvements (most importantly usage of compact size), documentation, and an additional test.
The original snapshot at height 830,000 came in at 10.82 GB. With this change, the same snapshot is 8.94 GB, a reduction of 17.4%.
This also enhances the metadata of the output file and adds the following data to allow for better error handling and make future upgrades easier: