Skip to content

Conversation

l0rinc
Copy link
Contributor

@l0rinc l0rinc commented Jun 11, 2025

Summary

This PR extracts a set of cheap, inline helpers for recognizing the most common script templates, clarifies the definition of valid op-codes for pre-Taproot scripts, and splits the existing "deserialize + check-block" benchmark into three orthogonal micro-benchmarks.
The change is behavior-neutral for consensus and policy - every modified call-site performs the same checks as before, just through clearer helper functions.

Context

While working on GetSigOpCount optimizations for known script templates I noticed that feature-test coverage was thin for this code path.
This PR therefore adds tests for error cases (malformed push-data encodings) and for the expected sigop totals of the various script templates (including edge cases like a sigop after an OP_RETURN).

Given the difficulty of reviewing the original optimizations themselves, I split all test / benchmark work into this standalone PR.
The recent burst of sigops related refactor activity (#31624, #32521, #32533) underlines the need for extra safety.

The refactors here eliminate magic numbers, deduplicate template checks, and lay groundwork for future performance work.

Structure

  • Benchmarks - first commits separate deserialization+hashing, validation, and sigop counting so each cost can be measured independently.
  • Template helpers - tiny, mechanical commits move each script-template check into script.h, replace all ad-hoc byte-checks, and add tests where necessary.
  • Tests - a full script-test suite covering malformed PUSHDATA sequences and the pre-taproot / accurate sigop totals for every standard template.
  • pre-taproot opcode ceiling - documents, enforces, and tests that OP_CHECKSIGADD > MAX_OPCODE.
  • Extra fuzzing - enabled all possible opcodes in ConsumeOpcodeType

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 11, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32729.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK ryanofsky, hodlinator

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #32554 (bench: replace embedded raw block with configurable block generator by l0rinc)
  • #32317 (kernel: Separate UTXO set access from validation functions by TheCharlatan)
  • #31989 (BIP-119 (OP_CHECKTEMPLATEVERIFY) (regtest only) by jamesob)
  • #31868 ([IBD] specialize block serialization by l0rinc)
  • #31682 ([IBD] specialize CheckBlock's input & coinbase checks by l0rinc)
  • #29060 (Policy: Report reason inputs are non standard by ismaelsadeeq)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@l0rinc l0rinc changed the title test,refactor: sigops test: extract script template helpers & widen sigop count coverage Jun 11, 2025
@DrahtBot DrahtBot added the Tests label Jun 11, 2025
@l0rinc l0rinc changed the title test: extract script template helpers & widen sigop count coverage test,refactor: extract script template helpers & widen sigop count coverage Jun 11, 2025
@l0rinc l0rinc marked this pull request as ready for review June 11, 2025 18:21
@l0rinc l0rinc force-pushed the l0rinc/sigop-testing branch from 95b67f1 to 0582b63 Compare June 12, 2025 11:12
@DrahtBot
Copy link
Contributor

🚧 At least one of the CI tasks failed.
Task tidy: https://github.com/bitcoin/bitcoin/runs/43965124848
LLM reason (✨ experimental): The failure is caused by a compilation error in policy.cpp where a boolean value is incorrectly passed to a function expecting a CScript reference.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@l0rinc l0rinc force-pushed the l0rinc/sigop-testing branch from 0582b63 to 5d29ce5 Compare July 21, 2025 19:57
@l0rinc
Copy link
Contributor Author

l0rinc commented Jul 21, 2025

@l0rinc l0rinc force-pushed the l0rinc/sigop-testing branch from 5d29ce5 to 6bf4c8d Compare July 28, 2025 20:40
@l0rinc
Copy link
Contributor Author

l0rinc commented Jul 28, 2025

Rebased after #32279 - updating the tests with the new convenience template helpers instead of using Solver for them. Ready for review again.

@l0rinc l0rinc force-pushed the l0rinc/sigop-testing branch 2 times, most recently from 1d49b3b to db4bf84 Compare August 5, 2025 02:25
@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 5, 2025

🚧 At least one of the CI tasks failed.
Task MSan, depends: https://github.com/bitcoin/bitcoin/runs/47383324322
LLM reason (✨ experimental): The CI failure is caused by a compilation error in script_ops.cpp: 'GetSigOpCount' is missing in 'CScript'.

Hints

Try to run the tests locally, according to the documentation. However, a CI failure may still
happen due to a number of reasons, for example:

  • Possibly due to a silent merge conflict (the changes in this pull request being
    incompatible with the current code in the target branch). If so, make sure to rebase on the latest
    commit of the target branch.

  • A sanitizer issue, which can only be found by compiling with the sanitizer and running the
    affected test.

  • An intermittent issue.

Leave a comment here, if you need help tracking down a confusing failure.

@l0rinc l0rinc force-pushed the l0rinc/sigop-testing branch from db4bf84 to f8208e9 Compare August 5, 2025 03:56
@DrahtBot DrahtBot removed the CI failed label Aug 5, 2025
Copy link
Contributor Author

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews, addressed all concerns - either by applying it or explaining my take. Let me know if I left out something.

Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed cf8f476

Main thing remaining is question about motivation behind moving methods (see inline comment).

@@ -222,15 +222,6 @@ bool CScript::IsPayToAnchor(int version, const std::vector<unsigned char>& progr
program[1] == 0x73;
}

bool CScript::IsPayToScriptHash() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In 6b167d5 etc, what is the gain from making these methods inline by moving them to the header?
  2. nit: If these moves are kept, would suggest using "move" rather than "pull" in the commit message.
  3. nit: Would suggest using "extract" rather than "add" (except for IsPayToWitnessPubKeyHash() which is new code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we don't introduce a header + implementation duplication - what would be the advantage in this case of separating them to header + impl?

2,3) done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2,3) thanks!

1): Arguments for avoiding the move:

➕: Avoids moving 4 functions to headers, minimizing the diff and making it clearer which parts of the body changed if any.

➕: Someone chose to have them like this, so best to avoid flip-flopping for minor reasons.

➕: Keeps the script.h header small and faster to re-parse for each compilation unit as it is a commonly included header, direct includes:

₿ git grep "<script/script\.h>" |wc -l
98

(➖): Not inlining adds some linking time, but those functions are rarely called.

➕: Increases readability for getting an overview of CScript for programmers using editors that don't auto-collapse function bodies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer having them in the same place, @ryanofsky, where do you think these helpers belong, in header or cpp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #32729 (comment)

I prefer having them in the same place, @ryanofsky, where do you think these helpers belong, in header or cpp?

No real opinion, I just assumed these needed to be moved for the optimization in d76d753.

Abstractly, I think it makes sense for these to be in a header file since they are low-level and natural candidates to inline. Ideally I don't think they'd be in this header file for reasons hodlinator mentioned, and I would probably want them to be standalone functions not class methods since I think the purpose of classes is to encapsulate internal state and these aren't accessing anything private.

But I think anything is fine here, and it makes sense to go with your preference if there aren't objections.

@fanquake
Copy link
Member

https://github.com/bitcoin/bitcoin/actions/runs/16872143934/job/47788785406?pr=32729#step:6:4338:

 test/script_tests.cpp(1620): Entering test case "GetOpName_no_missing_mnemonics"
/home/runner/work/_temp/src/test/script_tests.cpp:1622:25: runtime error: load of value 256, which is not a valid value for type 'opcodetype'
    #0 0x559b4df3e847 in script_tests::GetOpName_no_missing_mnemonics::test_method() /home/runner/work/_temp/build-asan/src/test/./test/script_tests.cpp:1622:25
    #1 0x559b4df3ba1b in script_tests::GetOpName_no_missing_mnemonics_invoker() /home/runner/work/_temp/build-asan/src/test/./test/script_tests.cpp:1620:1
    #2 0x559b4d0f360d in boost::function0<void>::operator()() const /usr/include/boost/function/function_template.hpp:771:14
    #3 0x559b4d174328 in boost::detail::forward::operator()() /usr/include/boost/test/impl/execution_monitor.ipp:1395:32
    #4 0x559b4d174328 in boost::detail::function::function_obj_invoker0<boost::detail::forward, int>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:137:18
    #5 0x559b4d16de2d in boost::function0<int>::operator()() const /usr/include/boost/function/function_template.hpp:771:14
    #6 0x559b4d0598fc in int boost::detail::do_invoke<boost::shared_ptr<boost::detail::translator_holder_base>, boost::function<int ()>>(boost::shared_ptr<boost::detail::translator_holder_base> const&, boost::function<int ()> const&) /usr/include/boost/test/impl/execution_monitor.ipp:308:30
    #7 0x559b4d0598fc in boost::execution_monitor::catch_signals(boost::function<int ()> const&) /usr/include/boost/test/impl/execution_monitor.ipp:910:16
    #8 0x559b4d059e0d in boost::execution_monitor::execute(boost::function<int ()> const&) /usr/include/boost/test/impl/execution_monitor.ipp:1308:16
    #9 0x559b4d052498 in boost::execution_monitor::vexecute(boost::function<void ()> const&) /usr/include/boost/test/impl/execution_monitor.ipp:1404:5
    #10 0x559b4d052498 in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::function<void ()> const&, unsigned long) /usr/include/boost/test/impl/unit_test_monitor.ipp:49:9
    #11 0x559b4d0b5fd4 in boost::unit_test::framework::state::execute_test_tree(unsigned long, unsigned long, boost::unit_test::framework::state::random_generator_helper const*) /usr/include/boost/test/impl/framework.ipp:815:44
    #12 0x559b4d0b4e9b in boost::unit_test::framework::state::execute_test_tree(unsigned long, unsigned long, boost::unit_test::framework::state::random_generator_helper const*) /usr/include/boost/test/impl/framework.ipp:784:58
    #13 0x559b4d0b4e9b in boost::unit_test::framework::state::execute_test_tree(unsigned long, unsigned long, boost::unit_test::framework::state::random_generator_helper const*) /usr/include/boost/test/impl/framework.ipp:784:58
    #14 0x559b4d0508eb in boost::unit_test::framework::run(unsigned long, bool) /usr/include/boost/test/impl/framework.ipp:1722:29
    #15 0x559b4d07f350 in boost::unit_test::unit_test_main(boost::unit_test::test_suite* (*)(int, char**), int, char**) /usr/include/boost/test/impl/unit_test_main.ipp:250:9
    #16 0x7f2c01d841c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 282c2c16e7b6600b0b22ea0c99010d2795752b5f)
    #17 0x7f2c01d8428a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 282c2c16e7b6600b0b22ea0c99010d2795752b5f)
    #18 0x559b4cf44984 in _start (/home/runner/work/_temp/build-asan/bin/test_bitcoin+0x12c5984) (BuildId: 536afe9665aa0f1665ca9d21723b3d0c1c5aa9a6)

SUMMARY: UndefinedBehaviorSanitizer: invalid-enum-load /home/runner/work/_temp/src/test/script_tests.cpp:1622:25 

@bitcoin bitcoin deleted a comment from Thisdick69 Aug 12, 2025
@l0rinc l0rinc force-pushed the l0rinc/sigop-testing branch from cf8f476 to 738b620 Compare August 12, 2025 16:25
l0rinc and others added 12 commits August 12, 2025 09:25
To measure CheckBlock performance in isolation from deserialization overhead, a ResetChecked() method was introduced to re-enable the block's validation state, allowing repeated validation of the same block instance.
Add benchmark to measure the performance of counting legacy signature operations in a block, independent of other validation steps.
Previous `GetSigOpCount` method was overloaded to take either a `bool` or `scriptSig` as a parameter, without an explanation of when to call each overload.
New `CountSigOps` method avoids the overloading and documents how it should be called. The name was chosen to be clearer and consistent with the newer `CountWitnessSigOps` function.
Besides the renames, also added primitive argument name reminders to the call sites to highlight the meaning of the call.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
The name `CountP2SHSigOps` was chosen to match `CountWitnessSigOps`, since the two functions are counterparts for handling `P2SH` and `SegWit` scripts.
Also, it's called by `GetP2SHSigOpCount` in consensus, so the new name clarifies its significance.

A goal of this change is to make the `CheckSigopsBIP54` function match the BIP54 specification which says "For each input in the transaction, count the number of CHECKSIG and CHECKMULTISIG in the input scriptSig and previous output's scriptPubKey, including the P2SH redeemScript."

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
This enables using these constants in script.h definitions in upcoming commits. No naming conflicts exist with these constant names.
Moved the implementation to the header (implicitly inline + noexcept), removed redundant `this` and parentheses and changed the `0x14`/`20` magic constants to descriptive `HASH160_OUTPUT_SIZE`.
See: https://learnmeabitcoin.com/technical/script/p2sh/#scriptpubkey
Moved the implementation to the header (implicitly inline + noexcept), removed redundant `this` and parentheses, and changed the `0x20` magic constant to descriptive `WITNESS_V0_SCRIPTHASH_SIZE`.
See: https://learnmeabitcoin.com/technical/script/p2wsh/#scriptpubkey
Moved the implementation to the header (implicitly inline + noexcept), removed redundant `this` and parentheses.
Moved the implementation to the header (implicitly inline + noexcept), removed redundant `this` and parentheses, and changed the `0x20` magic constant to descriptive `WITNESS_V1_TAPROOT_SIZE`.
See: https://learnmeabitcoin.com/technical/script/p2tr/#scriptpubkey
The usages in `compressor.cpp` and `solver.cpp` were also updated to use the new method.
See: https://learnmeabitcoin.com/technical/script/p2pkh/#scriptpubkey
The usages in `compressor.cpp` and `solver.cpp` were also updated to use the new method.
Note that compressor has additional prefix checks as well, which are not properly exercised by the `compressed_p2pk` unit test.
See: https://learnmeabitcoin.com/technical/script/p2pk/#scriptpubkey and https://learnmeabitcoin.com/technical/keys/public-key/#compressed
The usages in `compressor.cpp` and `solver.cpp` were also updated to use the new method.
Note that compressor has additional prefix checks as well, which are not properly exercised by the `uncompressed_p2pk` unit test.
See: https://learnmeabitcoin.com/technical/script/p2pk/#scriptpubkey and https://learnmeabitcoin.com/technical/keys/public-key/#uncompressed
@l0rinc l0rinc force-pushed the l0rinc/sigop-testing branch from 738b620 to 8ec20fb Compare August 12, 2025 16:25
Copy link
Contributor Author

@l0rinc l0rinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addresses remaining nits + fixed test opcode name iteration boundary failure: https://github.com/bitcoin/bitcoin/compare/cf8f476ef25e4fe43e67970ee2ec6a265d49e763..738b620f314780751c4db3af87c9f22792723125
(rebased in a separate push)

@@ -222,15 +222,6 @@ bool CScript::IsPayToAnchor(int version, const std::vector<unsigned char>& progr
program[1] == 0x73;
}

bool CScript::IsPayToScriptHash() const
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we don't introduce a header + implementation duplication - what would be the advantage in this case of separating them to header + impl?

2,3) done

l0rinc and others added 5 commits August 12, 2025 15:19
This template helper isn't used outside of tests yet, but for the sake of completeness - and since we're planning on using this for upcoming optimizations -, it's added as a last step.

See: https://learnmeabitcoin.com/technical/script/p2wpkh/#scriptpubkey
* `CountSigOpsKnownTemplates` asserts the expected legacy/accurate sig-op totals for all common known script templates (P2PKH, P2SH, P2WPKH/WSH, P2TR, compressed & uncompressed P2PK, OP_RETURN, multisig).
* `CountSigOpsErrors` feeds malformed PUSHDATA sequences to confirm the parser never crashes and still counts sig-ops that appear before the error.
* `GetOpName_no_missing_mnemonics` checks all opcode names by category.

Co-authored-by: Hodlinator <172445034+hodlinator@users.noreply.github.com>
Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
…validation

Renames `HasValidOps()` to `HasValidBaseOps()` and `MAX_OPCODE` to `MAX_BASE_OPCODE` to clarify that pre-Taproot script validation should only accept opcodes up to OP_NOP10 (0xb9).

This prevents Tapscript-only opcodes like OP_CHECKSIGADD (0xba) from being considered valid in scriptPubKey and scriptSig contexts, where only pre-Taproot opcodes should be allowed.
Add explicit test case checking that OP_CHECKSIGADD > MAX_BASE_OPCODE (i.e. that we have opcodes bigger than the max).
The other changes in the test are just inlines to reduce scope.

Also note that `OpCodeParser` deliberately skips `OP_CHECKSIGADD`, see: `script_parse_tests/parse_script`.
`OP_CHECKSIGADD` wasn't tested this way at all, but we should fuzz all possible values anyway.
@l0rinc l0rinc force-pushed the l0rinc/sigop-testing branch from 8ec20fb to 3219b59 Compare August 12, 2025 22:21
Copy link
Contributor

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 3219b59. A lot of test improvements since last review, only comment and whitespace changes in non-test code. I think I'm caught up on the discussion now too

@@ -222,15 +222,6 @@ bool CScript::IsPayToAnchor(int version, const std::vector<unsigned char>& progr
program[1] == 0x73;
}

bool CScript::IsPayToScriptHash() const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: #32729 (comment)

I prefer having them in the same place, @ryanofsky, where do you think these helpers belong, in header or cpp?

No real opinion, I just assumed these needed to be moved for the optimization in d76d753.

Abstractly, I think it makes sense for these to be in a header file since they are low-level and natural candidates to inline. Ideally I don't think they'd be in this header file for reasons hodlinator mentioned, and I would probably want them to be standalone functions not class methods since I think the purpose of classes is to encapsulate internal state and these aren't accessing anything private.

But I think anything is fine here, and it makes sense to go with your preference if there aren't objections.

@DrahtBot DrahtBot requested a review from hodlinator August 13, 2025 12:12
Copy link
Contributor

@hodlinator hodlinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 3219b59

Renaming the sigop-counting functions while changing behavior of CScript::GetSigOpCount(const CScript& scriptSig)/CountP2SHSigOps(const CScript& scriptSig, const CScript& scriptPubKey) makes it relatively easy to verify the change, as no lingering GetSigOpCount overloads remain.

Renaming HasValidOps() -> HasValidBaseOps() makes one paranoid that we haven't been supporting Tapscript op codes everywhere, but the function is not used in those contexts.

I disagree with moving IsPayToTaproot() etc from .CPP to .H, but it's not blocking.

Stay humble and stack commits! :)
(Not advocating for merging commits, it is helpful to keep similar transforms applied to different areas in separate commits, other reviewers shouldn't be scared by the high count of them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants