Skip to content

Conversation

fjahr
Copy link
Contributor

@fjahr fjahr commented Mar 10, 2024

The second attempt at implementing the dumptxoutset space optimization as suggested in #25675. Closes #25675.

This builds on the work done in #26045, addresses open feedback, adds some further improvements (most importantly usage of compact size), documentation, and an additional test.

The original snapshot at height 830,000 came in at 10.82 GB. With this change, the same snapshot is 8.94 GB, a reduction of 17.4%.

This also enhances the metadata of the output file and adds the following data to allow for better error handling and make future upgrades easier:

  • A newly introduced utxo set magic
  • A version number
  • The network magic
  • The block height

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 10, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK TheCharlatan, theStack, achow101
Stale ACK mzumsande, Sjors

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29775 (Testnet4 including PoW difficulty adjustment fix by fjahr)
  • #29307 (util: check for errors after close and read in AutoFile by vasild)
  • #28670 (assumeutxo, rpc: Improve EOF error when reading snapshot metadata in loadtxoutset by pablomartin4btc)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/22485931752

@Sjors
Copy link
Member

Sjors commented Mar 11, 2024

我最近正在休假中

Well, I'm not on holiday :-)

tACK b0cfbce

I do suggest adding a comment to PopulateAndValidateSnapshot in validation.h:

To reduce space, this function takes advantage of the guarantee by leveldb that keys are lexicographically sorted.

I generated a snapshot and loaded it using the patch in #29551 (only waited for it to reach the tip, not a full background sync).

For those testing who don't want to wait out the long flush, see #28358.

@fjahr fjahr force-pushed the 2024-03-pr26045-reopen branch 2 times, most recently from 33aa19d to 97eb214 Compare March 11, 2024 20:59
@fjahr
Copy link
Contributor Author

fjahr commented Mar 11, 2024

Thanks for the review and testing!

I do suggest adding a comment to PopulateAndValidateSnapshot in validation.h:

To reduce space, this function takes advantage of the guarantee by leveldb that keys are lexicographically sorted.

As it is worded, wouldn't this be more appropriate to add in CreateUTXOSnapshot? That's where we are iterating with the pcursor, leveraging the sorting, and writing the coins to file, so I would say that's also where the space savings are realized.

I have now amended the comment a bit and added a version of it to both CreateUTXOSnapshot and PopulateAndValidateSnapshot. Please let me know if you think this is alright: 923ec90

@fjahr fjahr force-pushed the 2024-03-pr26045-reopen branch from 97eb214 to 923ec90 Compare March 11, 2024 21:02
@Sjors
Copy link
Member

Sjors commented Mar 12, 2024

re-ACK 923ec90

That looks good.

The reason I suggested (also) writing something in the header is because it's more likely to be noticed by some future dev looking through existing leveldb uses, seeing if they can replace it.

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, ACK 923ec90

Just some comment nits that could easily be done in a follow-up too.

Coin coin;
unsigned int iter{0};
std::vector<std::pair<uint32_t, Coin>> coins;

// To reduce space the serialization format of the snapshot avoids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Might it be useful to add a small diagram of the format to the dumptxoutset help output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will think about what that could look like if I have to retouch or in the follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time thinking about a diagram that would work but I wasn't very happy what I could come up with. I instead copied the expression used in the issue. It shows the format in a concise way that I find understandable: list(Txid, list((vout,Coin)). Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add the size of the coins list list(Txid, number_of_coins, list((vout,Coin)))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this if I have to retouch, but if not I can do this as a follow-up in one of my other assumeutxo PRs.

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-ACK 71d41d5

@DrahtBot DrahtBot requested a review from Sjors March 20, 2024 15:03
Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just some small comments, will review again quickly once they are addressed.

fjahr and others added 4 commits May 21, 2024 13:38
Co-authored-by: Aurèle Oulès <aurele@oules.com>
Co-authored-by: TheCharlatan <seb.kung@gmail.com>
The following data is added:
- A newly introduced utxo set magic
- A version number
- The network magic
- The block height
@fjahr fjahr force-pushed the 2024-03-pr26045-reopen branch from b2cf3f7 to 542e13b Compare May 21, 2024 12:15
@fjahr
Copy link
Contributor Author

fjahr commented May 21, 2024

Addressed feedback from @TheCharlatan and @theStack , thank you!

Also changed the error code for the parsing of the metadata which was feedback in another PR: #28670 (comment)

Copy link
Contributor

@TheCharlatan TheCharlatan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-ACK 542e13b

@DrahtBot DrahtBot requested review from theStack and Sjors May 21, 2024 13:15
Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 542e13b

@@ -2695,24 +2696,48 @@ UniValue CreateUTXOSnapshot(
afile << metadata;

COutPoint key;
Txid last_hash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would still prefer last_txid over last_hash as it's more expressive (OTOH, in COutPoint, it's also called hash)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I must have overlooked that if you already mentioned it before when I changed the type. I will address it if I have to retouch.

@achow101
Copy link
Member

ACK 542e13b

@achow101 achow101 merged commit 413844f into bitcoin:master May 23, 2024
inline void Serialize(Stream& s) const {
s << SNAPSHOT_MAGIC_BYTES;
s << m_version;
s << Params().MessageStart();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It would be nice, if new code didn't use globals implicitly. What about storing a reference to the prams (or the magic) in a member field of this struct and using it here? An alternative would be to use params-serialization and embed the magic bytes during serialization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this in #30167 using the member field. Using params-serialization wouldn't have made much of a difference given how we use this code right now, I think.

@@ -5735,7 +5747,8 @@ bool ChainstateManager::PopulateAndValidateSnapshot(

bool out_of_coins{false};
try {
coins_file >> outpoint;
Txid txid;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this should check that the stream is completely empty, no? I suspect that a trailing byte will be left undetected, so it may be better to deserialize a std::byte here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in #30167

coins_per_txid = ReadCompactSize(coins_file);

if (coins_per_txid > coins_left) {
LogPrintf("[snapshot] mismatch in coins count in snapshot metadata and actual snapshot data\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is new code, but uncovered. It would be nice to have a test for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test for this in #30167

fjahr added a commit to fjahr/bitcoin that referenced this pull request May 24, 2024
fjahr added a commit to fjahr/bitcoin that referenced this pull request May 24, 2024
@fjahr
Copy link
Contributor Author

fjahr commented May 24, 2024

Added release notes and dddressed post-merge comments in #30167

fanquake added a commit that referenced this pull request Jun 3, 2024
efc1b5b test: Add coverage for txid coins count check when loading snapshot (Fabian Jahr)
6b60848 assumeutxo: Add network magic ctor param to SnapshotMetadata (Fabian Jahr)
1f1f998 assumeutxo: Deserialize trailing byte instead of Txid (Fabian Jahr)
359967e doc: Add release notes for #29612 (Fabian Jahr)

Pull request description:

  This adds release notes for #29612 and addresses post-merge review comments.

ACKs for top commit:
  maflcko:
    utACK efc1b5b
  theStack:
    utACK efc1b5b

Tree-SHA512: 3b270202e4f7b2576090ef1d970fd54a6840d96fc3621dddd28e888fb8696a97ff69af2e000bcee3b364316ca3f6e2a9b2f1694c6184f0e704dc487823127ce4
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Apr 24, 2025
Summary:
Co-authored-by: Aurèle Oulès <aurele@oules.com>
Co-authored-by: TheCharlatan <seb.kung@gmail.com>

This is a partial backport of [[bitcoin/bitcoin#29612 | core#29612]]
bitcoin/bitcoin@de95953
bitcoin/bitcoin@c14ed7f
bitcoin/bitcoin@4d8e5ed

Test Plan:
`ninja all check-all`

add a assume utxo checkpoint, dump the corresponding utxo set, start a fresh node, load the utxo set and let it fully validate

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D17985
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Apr 26, 2025
Summary:
The following data is added:
- A newly introduced utxo set magic
- A version number
- The network magic
- The block height

This concludes backport of [[bitcoin/bitcoin#29612 | core#29612]]
bitcoin/bitcoin@542e13b

Depends on D17989

Test Plan:
`ninja all check-all`

Dump an assumeutxo snapshot on a fully synced node, load it on a fresh new node, let it fully validate the chain

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Fabien

Differential Revision: https://reviews.bitcoinabc.org/D17994
@bitcoin bitcoin locked and limited conversation to collaborators Jun 10, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible savings in dumptxoutset serialization format (~20%)
9 participants