Skip to content

Conversation

l0rinc
Copy link
Contributor

@l0rinc l0rinc commented May 16, 2025

It is an optimization PR for GetSigOpCount, covered heavily with new tests, benchmarks and small refactorings.

The changes were split out to #32729 (refactors only for now).


Context

Test coverage was thin for this code path that suddenly became popular for different reasons (#31624 and #32521 and #32533) highlighting the need for better code coverage.

The PR now splits out the common PUSHDATA length/bounds reads and error checks and covers its corner cases with unit tests and benchmarks.

Testing

  • added unit tests covering every standard script type (and a historical OP_RETURN … OP_CHECKSIG oddity);
  • added fuzz-style test that compares the refactored implementation against the exact legacy algorithm on random scripts - also ran it continuously for 25 hours without divergence;
  • reindexed to height 897,000 comparing the outputs of the old and the new implementation without incidents.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 16, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32532.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept NACK darosior
Concept ACK tapcrafter, ryanofsky

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #32673 (clang-tidy: Apply modernize-deprecated-headers by maflcko)
  • #32554 (bench: replace embedded raw block with configurable block generator by l0rinc)
  • #32521 (policy: make pathological transactions packed with legacy sigops non-standard by darosior)
  • #31989 (BIP-119 (OP_CHECKTEMPLATEVERIFY) (regtest only) by jamesob)
  • #31682 ([IBD] specialize CheckBlock's input & coinbase checks by l0rinc)
  • #29060 (Policy: Report reason inputs are non standard by ismaelsadeeq)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link

@tapcrafter tapcrafter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK 432ff7e.

Can't fully speak to the correctness of the refactor itself, still brushing up on my C++. So can't give a full ACK but left some questions instead.

Test protocol

New benchmark test on 2e8a9e1 (first commit of PR, before refactor):

|           ns/sigops |            sigops/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               50.76 |       19,700,534.44 |    0.5% |      0.01 | `SigOpsBlockBench`

On 432ff7e (last commit):

|           ns/sigops |            sigops/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               18.87 |       53,002,534.31 |    0.4% |      0.01 | `SigOpsBlockBench`

I also ran the unit tests a couple of times to excercise the randomized tests in src/test/sigopcount_tests.cpp.
No failures to report.

@maflcko
Copy link
Member

maflcko commented May 19, 2025

I can't tell from the pull description, but is there a end-user visible performance difference? If yes, how much?

@l0rinc l0rinc force-pushed the l0rinc/short-circuit-known-script-types branch from 432ff7e to bde1fc1 Compare May 19, 2025 09:19
@l0rinc
Copy link
Contributor Author

l0rinc commented May 19, 2025

Thanks for the observations @tapcrafter, addressed your concerns in the latest push, adding you as a co-author in one of the commits.

@maflcko, to clarify the effect on the overall behavior, I slimmed down the DeserializeAndCheckBlockTest similarly to #32457 and added the results to the PR's description.


I ran

cmake -B build -DBUILD_BENCH=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
build/bin/bench_bitcoin -filter='CheckBlockBench' --min-time=10000

several times before and after the SigOp fast-path.

To isolate the remaining hot-spots I also:

Before (ef73758 - bench: measure behavior of a more representative block):

|            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|        1,077,514.54 |              928.06 |    0.0% |    8,231,717.91 |    3,867,061.18 |  2.129 |   1,234,943.74 |    2.5% |     11.00 | `CheckBlockBench`
|        1,077,039.60 |              928.47 |    0.1% |    8,231,717.76 |    3,865,853.67 |  2.129 |   1,234,943.80 |    2.5% |     11.01 | `CheckBlockBench`
|        1,077,973.36 |              927.67 |    0.1% |    8,231,717.62 |    3,869,437.50 |  2.127 |   1,234,943.75 |    2.5% |     11.01 | `CheckBlockBench`

Before (9d71910 - specialize CheckBlock's input & coinbase checks):

|            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          836,899.54 |            1,194.89 |    0.1% |    6,618,933.87 |    3,003,330.32 |  2.204 |     847,006.84 |    2.4% |     10.77 | `CheckBlockBench`
|          838,135.14 |            1,193.13 |    0.0% |    6,618,933.87 |    3,008,503.94 |  2.200 |     847,006.84 |    2.4% |     10.78 | `CheckBlockBench`
|          835,303.73 |            1,197.17 |    0.1% |    6,618,933.87 |    2,997,539.89 |  2.208 |     847,006.84 |    2.4% |     10.77 | `CheckBlockBench`

After (ddc82fd - script: short-circuit GetLegacySigOpCount for known scripts):

|            ns/block |             block/s |    err% |       ins/block |       cyc/block |    IPC |      bra/block |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|          753,774.48 |            1,326.66 |    0.0% |    5,224,278.78 |    2,705,172.05 |  1.931 |     597,353.76 |    3.4% |     10.70 | `CheckBlockBench`
|          753,337.59 |            1,327.43 |    0.0% |    5,224,278.78 |    2,704,108.71 |  1.932 |     597,353.76 |    3.4% |     10.70 | `CheckBlockBench`
|          749,374.04 |            1,334.45 |    0.0% |    5,224,278.78 |    2,690,235.39 |  1.942 |     597,353.75 |    3.4% |     10.69 | `CheckBlockBench`

~11.4% additional validation speed-up thanks to short-circuit SigOp counting (the two changes together result in a ~43.7% validation speedup for the given block).
Note that instruction count also dropped from 6,6M to 5.2M, branch executions from 0.8M to 0.6M.

@maflcko
Copy link
Member

maflcko commented May 19, 2025

Ah, ok. So from extrapolating the previous comment and #31682 (comment), the IBD speedup should be around 1%?

Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK bde1fc1. Seems like this makes code clearer and faster without changing complexity, and adds tests.

@l0rinc l0rinc changed the title RFC: script: short-circuit GetLegacySigOpCount for known scripts script: short-circuit GetLegacySigOpCount for known scripts May 19, 2025
@l0rinc l0rinc marked this pull request as ready for review May 19, 2025 15:08
Copy link
Member

@darosior darosior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a significant refactoring of tricky consensus critical routines that have essentially never been modified since Satoshi wrote them 15 years ago. The exploration is interesting but it seems ill-advised to completely overhaul this critical code for a marginal IBD speedup.

I feel bad for being stop energy here, but since i have been an advocate for us voicing our opinion instead of just ignoring PRs i'll go and own it. Concept NACK: i don't think the risk-benefit ratio of this change is worth it.

@ryanofsky
Copy link
Contributor

Might also help to add all the benchmark and test changes and simpler improvements like inlining in a separate PR, so it's a more clear how tricky the sensitive changes here are, and if the sensitive changes are really worth making.

@l0rinc l0rinc marked this pull request as draft May 20, 2025 13:15
@l0rinc l0rinc force-pushed the l0rinc/short-circuit-known-script-types branch 2 times, most recently from 560eed6 to 5b11b87 Compare May 21, 2025 13:28
@l0rinc l0rinc changed the title script: short-circuit GetLegacySigOpCount for known scripts refactor: extract DecodePushData from GetScriptOp and GetLegacySigOpCount May 21, 2025
@l0rinc l0rinc requested review from darosior and ryanofsky May 21, 2025 13:36
@l0rinc
Copy link
Contributor Author

l0rinc commented May 21, 2025

@ryanofsky, @darosior, thank you for expressing your concerns.
I have removed the optimization part, I understand why that can be considered risky. The new rebase only contains extraction of the common script parts to avoid some work duplication and to help with understanding this part of the codebase better. It's also covered extensively with deterministic and random tests, comparing against fixed values, non-standard scripts, and completely random ones compared against the original implementation. We can reconsider the optimizations if this gets accepted.

Also note that the optimizations weren't there for IBD necessarily, but as the checkblock bench states:

// These are the two major time-sinks which happen after we have fully received
// a block off the wire, but before we can relay the block on to peers using
// compact block relay.

l0rinc added 12 commits June 6, 2025 19:36
Also added primitive argument name reminder to the call sites to highlight the meaning of the call.
This will be needed in upcoming commit, to be able to use previously defined constants in `script.h`.
I made sure to check that there were no other constants with the same names.
Moved the implementation to the header (implicitly inline + noexcept), removed redundant `this` and parenthesis, used descriptive `front()`/`back()` methods and changed the `0x14` magic constant to descriptive `WITNESS_V0_KEYHASH_SIZE`.
See: https://learnmeabitcoin.com/technical/script/p2sh/#scriptpubkey
Moved the implementation to the header (implicitly inline + noexcept), removed redundant `this` and parenthesis, used descriptive `front()` method and changed the `0x20` magic constant to descriptive `WITNESS_V0_SCRIPTHASH_SIZE`.
See: https://learnmeabitcoin.com/technical/script/p2wsh/#scriptpubkey
Moved the implementation to the header (implicitly inline + noexcept), removed redundant `this` and parenthesis, used descriptive `front()`/`back() methods.
Moved the implementation to the header (implicitly inline + noexcept), removed redundant `this` and parenthesis, used descriptive `front()` method and changed the `0x20` magic constant to descriptive `WITNESS_V1_TAPROOT_SIZE`.
The usages in `compressor.cpp` and `solver.cpp` were also updated to use the new methods.
The usages in `compressor.cpp` and `solver.cpp` were also updated to use the new methods.
Note that compressor has additional checks as well, which are not properly exercised by the `compressed_p2pk` unit test.
The usages in `compressor.cpp` and `solver.cpp` were also updated to use the new methods.
Note that compressor has additional checks as well, which are not properly exercised by the `compressed_p2pk` unit test.
The usages in `compressor.cpp` were also updated to use the new methods.
* `GetLegacySigOpCountErrors` feeds malformed PUSHDATA sequences to confirm the parser never crashes and still counts sig‑ops that appear before the error.
* `GetLegacySigOpCountKnownTemplates` asserts the expected legacy/accurate sig‑op totals for all common known script templates (P2PKH, P2SH, P2WPKH/WSH, P2TR, compressed & uncompressed P2PK, OP_RETURN, multisig).
…GetLegacySigOpCount` as well

Introduce helper `DecodePushData` that parses the size prefixes, checks bounds, and returns size_bytes & data_bytes - or error on truncation.

To make absolutely sure it's just a refactor, a new unit test was added which asserts the refactored `GetLegacySigOpCount` exactly matches the previous algorithm.
@l0rinc l0rinc force-pushed the l0rinc/short-circuit-known-script-types branch from efb5179 to 0819a34 Compare June 6, 2025 17:37
@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 6, 2025

🚧 At least one of the CI tasks failed.
Task lint: https://github.com/bitcoin/bitcoin/runs/43633911782
LLM reason (✨ experimental): The CI failure is due to a lint check failure caused by improper include syntax in source files.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

Copy link
Contributor Author

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews, addressed more of the concerns in smaller commits, even more tests: https://github.com/bitcoin/bitcoin/compare/5b11b8706f14e0545f7e07c7c029e24811dc6c9d..0819a34565d01e241e5a3f884a508472861c96e1

@darosior, this change isn't about IBD, it was mostly about block connect time optimization originally - now it's rather meant as a cleanup and flooding it with tests and benchmarks to enable optimizations in follow-ups.

{
if (script.size() == 35 && script[0] == 33 && script[34] == OP_CHECKSIG
&& (script[1] == 0x02 || script[1] == 0x03)) {
if (script.IsCompressedPayToPubKey()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these are rechecked later, but it looks like we need some unit tests for this area - thanks for the detailed review, I have incorrectly removed it during a rebase - since I got confused of the different ways of checking the same.

It was also noticed by #32532 (comment) (seems I misunderstood his comment)

pubkey.Set(&script[1], &script[34]);
return true;
}
if (script.size() == 67 && script[0] == 65 && script[66] == OP_CHECKSIG
&& script[1] == 0x04) {
if (script.IsUncompressedPayToPubKey()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, split these into smaller commits to make sure we don't miss these anymore


bool IsCompressedPayToPubKey() const noexcept
{
return size() == 35 &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought about it, it seems more redundant (which I think could be safer for legacy code) if we state the sizes explicitly, otherwise changing the CPubKey::COMPRESSED_SIZE value might not fail immediately. Every other helper here has fixed size - and I couldn't find a reasonable value for all of them, decided to hard-code the sizes instead.
Do you have an idea that we could apply to all of these?

bool rewound = stream.Rewind(benchmark::data::block413567.size());
assert(rewound);

block.fChecked = block.m_checked_witness_commitment = block.m_checked_merkle_root = false; // Reset the cached state
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, we can even split ResetChecked out of CBlock::SetNull

}};

auto expected_sigops{0};
for (const auto& tx : block.vtx) expected_sigops += GetLegacySigOpCount(*tx);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this duplicating the tx_verify function

At the moment it seemed simpler to copy it over, but you're right that I can now just call it directly - thanks.
We can also hard-code the expected_sigops to have more redundancy here, since it's legacy code.

bool IsPayToScriptHash() const;
bool IsPayToWitnessScriptHash() const;
bool IsWitnessProgram(int& version, std::vector<unsigned char>& program) const;
bool IsPayToPubKeyHash() const noexcept
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

@@ -28,6 +30,160 @@ Serialize(const CScript& s)

BOOST_FIXTURE_TEST_SUITE(sigopcount_tests, BasicTestingSetup)

BOOST_AUTO_TEST_CASE(GetLegacySigOpCountErrors)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is adding new coverage
but not really new coverage

Not sure, many of these branches were never covered - based on https://maflcko.github.io/b-c-cov/total.coverage/src/script/script.cpp.gcov.html
And based on https://corecheck.dev/bitcoin/bitcoin/pulls/32532 many more are covered now

Added comments to the tests, please let me know what I'm still missing.

Copy link
Contributor Author

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews, addressed more of the concerns in smaller commits, even more tests: https://github.com/bitcoin/bitcoin/compare/5b11b8706f14e0545f7e07c7c029e24811dc6c9d..0819a34565d01e241e5a3f884a508472861c96e1

@darosior, this change isn't about IBD, it was mostly about block connect time optimization originally - now it's rather meant as a cleanup and flooding it with tests and benchmarks to enable optimizations in follow-ups.

@l0rinc l0rinc changed the title refactor: extract DecodePushData from GetScriptOp and GetLegacySigOpCount refactor,test: extract DecodePushData from GetScriptOp and GetLegacySigOpCount Jun 6, 2025
@darosior
Copy link
Member

darosior commented Jun 6, 2025

@darosior, this change isn't about IBD, it was mostly about block connect time optimization originally - now it's rather meant as a cleanup and flooding it with tests and benchmarks to enable optimizations in follow-ups.

This further strengthens my point. I don't think we should make a major overhaul to tricky consensus-critical code that was never touched in 15 years just to "make it nicer" (and possibly enabling very marginal optimisations in the future).

@DrahtBot DrahtBot removed the CI failed label Jun 6, 2025
@ryanofsky
Copy link
Contributor

I don't think we should make a major overhaul to tricky consensus-critical code that was never touched in 15 years

@darosior I almost wonder if we are looking at the same PR. The current set of changes seem pretty trivial and are certainly not a major overhaul.

@l0rinc since the general arguments darosior's making are worth taking seriously and apply to a wide variety of changes maybe including the changes made here, it could make sense to separate the concerns more by opening a different PR to only improve the benchmarks and tests and not refactor the script code.

IMO, having now looked at the script changes, I think they are trivial and not risky, and they could be merged with the normal review process and 3 acks. But a conversation about raising the standard for changes like this could be worth having here or somewhere else. I do think a more productive way of raising the standard would be to do something like increase the number of expected acks, instead of relying on people to shoot down changes individually with nacks.

In a project like this it is rightfully very hard to argue on the side of taking any risk, but we should look out for places where risks are conserved and trying to reduce them one place increases them other places. IMO, it is important for changes like this to be able to progress with sufficient review so we are not put in a position of maintaining code that we are afraid to modify and that no remaining developers will have familiarity with. They also help test our review process and build confidence that it works and catches real issues. And at least in the past, they've been useful for deploying security fixes before revealing them publicly.

A related question is how to decide what changes to spend time working on and reviewing given limited review and development bandwidth. I'm not sure there is an answer to this other than letting people individually decide for themselves, but I think if we want to shift attention different places, proactively asking for reviews and highlighting changes we want to encourage is probably also a better approach than trying to actively discourage other changes. "Don't think of a pink elephant" is not a wildly effective strategy.

@darosior
Copy link
Member

darosior commented Jun 6, 2025

@ryanofsky Maybe "overhaul" isn't the right term if it implies that a change is necessarily huge, and i should have used "refactoring" instead. Regardless, my point stands.

> build/bin/bench_bitcoin -filter='SigOpsBlockBench' --min-time=1000

|           ns/sigops |            sigops/s |    err% |      ins/sigops |      cyc/sigops |    IPC |     bra/sigops |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|               58.43 |       17,115,813.09 |    0.1% |          354.20 |          209.79 |  1.688 |          89.46 |    2.8% |     10.83 | `SigOpsBlockBench`
@l0rinc l0rinc changed the title refactor,test: extract DecodePushData from GetScriptOp and GetLegacySigOpCount [tracking PR] - short-circuit GetLegacySigOpCount for known scripts Jun 7, 2025
@l0rinc
Copy link
Contributor Author

l0rinc commented Jun 11, 2025

Thanks @ryanofsky, I have extracted the tests and benchmarks to #32729 - adding a few more test cases based on missing code and feature coverage I have observed since.
The non-test codes changes are either move only with tiny changes (e.g. front() instead of (*this)[0] ), separated to different commits to ease the review as much as possible.
@darosior, you're also welcome to review or to ignore, up to you of course.

Edit: Now that the testing part was extracted I'm closing this to avoid confusion

@l0rinc l0rinc closed this Jun 11, 2025
@l0rinc l0rinc changed the title [tracking PR] - short-circuit GetLegacySigOpCount for known scripts Short-circuit GetLegacySigOpCount for known scripts Jun 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants