Skip to content

kernel: Separate UTXO set access from validation functions #32317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

TheCharlatan
Copy link
Contributor

The current code makes it hard to call functions for checking the consensus-correctness of a block without having a full UTXO set at hand. This makes implementing features such as Utreexo and non-assumevalid swiftsync, where maintaining a full UTXO set defeats their purpose, through the kernel API difficult. Checks like the coinbase subsidy, or block sigops are only available through ConnectBlock, which requires a complete coins cache.

Solve this by moving accessing coins and the responsibility of spending them out of ConnectBlock and into a new separate function called SpendBlock. Pass a block's spent coins through the existing CBlockUndo data structure to ConnectBlock.

While the change is largely a refactor, some smaller behaviour changes were unavoidable. These should be highlighted in the commit messages. The last commit contains the most important part of the coins logic move, the commits before attempt to prepare it piecemeal.

This pull request was benchmarked on benchcoin, which showed no performance regression. While discussing this pull request it was noted that pre-fetching and eliminating the need for some additional map lookups might improve performance a bit, but this does not make a measurable difference in the grand scheme of things and was not the motivation for this pull request.

In future changes, ConnectBlock could be made even more self contained, accessing the coins from the internal map could be further optimized by consolidating existence checking and spending, and the function eventually exposed in an external kernel API.

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 21, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32317.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK w0xlt, stickies-v, stringintech, enirox001

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #32998 (Bump SCRIPT_VERIFY flags to 64 bit by ajtowns)
  • #32840 (test: Enhance GetTxSigOpCost tests for coinbase transactions by average-gary)
  • #32729 (test,refactor: extract script template helpers & widen sigop count coverage by l0rinc)
  • #32575 (refactor: Split multithreaded case out of CheckInputScripts by fjahr)
  • #32468 (rpc: generateblock to allow multiple outputs by polespinasa)
  • #31974 (Drop testnet3 by Sjors)
  • #29641 (scripted-diff: Use LogInfo over LogPrintf [WIP, NOMERGE, DRAFT] by maflcko)
  • #29060 (Policy: Report reason inputs are non standard by ismaelsadeeq)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@TheCharlatan TheCharlatan changed the title kernel: Seperate UTXO set access from validation functions kernel: Separate UTXO set access from validation functions Apr 21, 2025
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Debug: https://github.com/bitcoin/bitcoin/runs/40876609750

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Copy link
Contributor

@w0xlt w0xlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, as this PR increases the decoupling of components.

nit: The name SpendBlock may not be clear (unless this concept is already known and I'm missing the point).
Maybe BIP30Validation, something like that, would be clearer.

@TheCharlatan
Copy link
Contributor Author

Re #32317 (review)

nit: The name SpendBlock may not be clear (unless this concept is already known and I'm missing the point).
Maybe BIP30Validation, something like that, would be clearer.

It is doing more than BIP-30 validation, so I don't think that would accurate either. I thought the name was pretty clear for what it does, but I'd be open to other suggestions.

Comment on lines 2174 to 2155
bool CheckInputScripts(const CTransaction& tx, TxValidationState& state,
const CCoinsViewCache& inputs, unsigned int flags, bool cacheSigStore,
std::vector<CTxOut>&& spent_outputs, unsigned int flags, bool cacheSigStore,
bool cacheFullScriptStore, PrecomputedTransactionData& txdata,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we keep passing the spent_outputs instead of removing it (to simplify the API) and requiring callers to initialize txdata before calling CheckInputScripts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read is that the call to txdata.Init is delayed to after checking the script execution cache for performance reasons. Calling Init does some hashing, that would not be required if it is already in the cache. Moving the spent_outputs vector construction is already a potential performance downside for potential cache hits, though I think that is more acceptable in the grand scheme of things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it makes sense. Thanks for the insights.

@TheCharlatan
Copy link
Contributor Author

Updated 76a8f22 -> 16a695f (spendblock_0 -> spendblock_1, compare)

Copy link
Contributor

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Reducing the coupling between validation functions and the UTXO set makes the validation functions more usable, and more testable. The approach taken here seems sensible, but I'm going to think about alternatives a bit more.

Comment on lines +177 to +182
for (const auto& txin : tx.vin) {
const COutPoint& prevout = txin.prevout;
const Coin& coin = AccessCoin(prevout);
assert(!coin.IsSpent());
spent_outputs.emplace_back(coin.out);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use AccessCoins here?

Suggested change
for (const auto& txin : tx.vin) {
const COutPoint& prevout = txin.prevout;
const Coin& coin = AccessCoin(prevout);
assert(!coin.IsSpent());
spent_outputs.emplace_back(coin.out);
}
for (const Coin& coin : AccessCoins(tx)) {
assert(!coin.IsSpent());
spent_outputs.emplace_back(coin.out);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inconsistent naming between GetUnspentOutputs and spent_outputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use AccessCoin, but I think accessing it directly is a bit more efficient, because we don't allocate another vector. Do you have a better suggestion for the naming there? It is all a bit confusing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered the extra vector allocation, yeah I'm okay leaving this as-is then.

I did nerd-snipe myself into looking into having AccessCoins return a view would be quite elegant in some ways, allowing lazy evaluation, but seems like it would require quite a bit of overhaul and might have other drawbacks I've not considered yet.

    auto AccessCoins(const CTransaction& tx) const
    {
        return tx.vin | std::views::transform([this](const CTxIn& input) -> const Coin& {
            return this->AccessCoin(input.prevout);
        });
    }

Comment on lines +127 to +128
std::vector<CTxOut> spent_outputs{active_coins_tip.GetUnspentOutputs(tx)};
txdata.Init(tx, std::move(spent_outputs));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a way, it's a bit silly to add a parameter to the CheckInputScripts signature to then just move it into another parameter (txdata). An alternative approach could be to let the caller handle this, and assert txdata.m_spent_outputs_ready, but that could raise runtime assertion errors instead of catching them at compile time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @TheCharlatan noted here, delaying txdata.Init could be on purpose for the potential cache hit here. I wonder if there is a cleaner version for achieving the same goal though.

Copy link
Contributor

@stringintech stringintech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.
I had a few more comments (sorry for the scattered feedback; didn't have the chance to review everything earlier).

@stickies-v
Copy link
Contributor

PSA: Bitcoin Core PR Review Club will cover this PR in its next meeting at 2025-05-14 at 17:00 UTC. See https://bitcoincore.reviews/32317 for notes, questions, and instructions on how to join.

@TheCharlatan
Copy link
Contributor Author

Thank you for the reviews @stringintech and @stickies-v!

16a695f -> e8ac6d2 (spendblock_1 -> spendblock_2, compare)

  • Addressed @stickies-v's comments 1, 2, 3, 4, 5 improving docstrings for ConnectBlock and SpendBlock.
  • Addressedd @stickies-v's comment, removed block_hash argument from ConnectBlock and SpendBlock methods.
  • Addressed @stickies-v's comment, added CoinRef concept.
  • Addressed @stickies-v's comment, removed unneeded helper, use direct initialization to solve msvc compilation error.
  • Addressed @stickies-v's comment, add assert for blockundo.vtxundo.size() to rule out UB.
    *Addressed @stringintech's comment, pass in empty span to GetTransactionSigOpCost.
  • Addressed @stringintech's comment, update documentation for the removal of the inputs availability check in CheckTxInputs.
  • Addressed @stringintech's comment, removed line in commit message mentioning execution of the coinbase amount check even when another check has failed previously.

@TheCharlatan
Copy link
Contributor Author

Rebased 6e90a0e -> 1f96d1c (spendblock_7 -> spendblock_8, compare)

@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Task previous releases, depends DEBUG: https://github.com/bitcoin/bitcoin/runs/45643912981
LLM reason (✨ experimental): The CI failure is caused by compilation errors in transaction_tests.cpp due to mismatched function calls to GetP2SHSigOpCount, indicating a code or API mismatch.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@TheCharlatan
Copy link
Contributor Author

@TheCharlatan
Copy link
Contributor Author

Move the responsibility of retrieving coins from GetP2SHSigOpCount to
its caller.

This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.

Define two explicit template specializations for both a span of
references to coins and a span of coins. This allows using it for both
Coin entries referenced from the CCoinsViewCache, and from contiguous
memory, like the vector in CBlockUndo.
Move the responsibility of retrieving coins from GetTransactionSigOpCost
to its caller.

This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.

Define two explicit template specializations for both a span of
references to coins and a span of coins. This allows using it for both
Coin entries referenced from the CCoinsViewCache, and from contiguous
memory, like the vector in CBlockUndo.
Move the responsibility of retrieving coins from CheckTxInputs
to its caller. The missing inputs check will be moved in an upcoming
commit.

This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.

Define two explicit template specializations for both a span of
references to coins and a span of coins. This allows using it for both
Coin entries referenced from the CCoinsViewCache, and from contiguous
memory, like the vector in CBlockUndo.
…InputScripts

Move the responsibility of retrieving coins from the CCoinsViewCache
in CheckInputScripts to its caller.

Add a helper method in CCoinsViewCache to collect all Coin's outputs
spent by a transaction's inputs.

Callers of CCoinsViewCache are updated to either pre-fetch the spent
outputs, or pass in an empty vector if the precomputed transaction data
has already been initialized with the required outputs.

This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.
Move the BIP30 checks from ConnectBlock to a new SpendBlock method.
This is a move-only change, more content to SpendBlock is added in the
next commits. The goal is to move logic requiring access to the
CCoinsViewCache out of ConnectBlock and to the new SpendBlock method.
SpendBlock will in future handle all UTXO set interactions that
previously took place in ConnectBlock.

Callers of ConnectBlock now also need to call SpendBlock before. This is
enforced in a future commit by adding a CBlockUndo argument that needs
to be passed to both.

This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.
This is a part of a series of commits for removing access to the
CCoinsViewCache in consensus verification functions. The goal is to
allow calling verification functions with pre-fetched, or a user-defined
set of coins.
Move the remaining UTXO-related operations from ConnectBlock to
SpendBlock. This includes moving the existence check, the UpdateCoins
call, and CBlockUndo generation.

ConnectBlock now takes a pre-populated CBlockUndo as an argument and no
longer accesses or manipulates the UTXO set.
@TheCharlatan
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ready for Review PRs
Development

Successfully merging this pull request may close these issues.

8 participants