test: apply strict verification flags for transaction tests and assert backwards compatibility #19698

glozow · 2020-08-11T17:51:24Z

This uses the first 4 commits of #15045, rebased and added some comments. The diff is quite large already and I want to make it easy to review, so I'm splitting it into 2 PRs (transaction and script). Script one is WIP, I'll link it when I open it.

Interpretation of scripts is dependent on the script verification flags passed in.
In tests, we should always apply maximal verification flags when checking that a transaction is valid; any additional flags should invalidate the transaction. A transaction should not be valid because we forgot to include a flag, and we should apply all flags by default.
We should apply minimal verification flags when asserting that a transaction is invalid; if verification flags are applied, removing any one of them should mean the transaction is valid.
New verify flags must be backwards compatible; tests should check backwards compatibility and apply the new flags by default. All tx_invalid tests should continue to be invalid with the exact same verify flags. All tx_valid tests that don't pass with new flags should explicitly indicate that the flags need to be excluded, and fail otherwise.

Flip the meaning of verifyFlags in tx_valid.json to mean excluded verification flags instead of included flags. Edit the test data accordingly.
Trim unneeded flags from tx_invalid.json.
Add check to verify that tx_valid tests have maximal flags and tx_invalid tests have minimal flags.
Add checks to verify that flags are soft forks (Make all script validation flags backward compatible #10699) i.e. adding any flag should only decrease the number of acceptable scripts. Test by adding/removing random flags.

luke-jr · 2020-08-11T22:10:58Z

Not sure this is a good idea. Tests should ideally test one thing only, and failing due to other bugs would be annoying.

laanwj · 2020-08-12T11:13:26Z

Looks like one of the sanitizers finds an integer conversion/truncation problem in the changed code:

�[1;34;49mtest/transaction_tests.cpp(251): Leaving test case "tx_invalid"; testing time: 591590us
�[0;39;49m�[1;34;49mtest/transaction_tests.cpp(164): Entering test case "tx_valid"
test/transaction_tests.cpp:237:35: runtime error: implicit conversion from type 'unsigned long' of value 18446744073709526401 (64-bit, unsigned) to type 'unsigned int' changed the value to 4294942081 (32-bit, unsigned)
    #0 0x558b927932d2 in transaction_tests::tx_valid::test_method() /tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/transaction_tests.cpp:237:35
    #1 0x558b9278fb88 in transaction_tests::tx_valid_invoker() /tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/transaction_tests.cpp:164:1
    #2 0x558b91e8e608 in boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:117:11
    #3 0x7f37eb6e23f1  (/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.71.0+0x353f1)
    #4 0x7f37eb6dfc74 in boost::execution_monitor::catch_signals(boost::function<int ()> const&) (/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.71.0+0x32c74)
    #5 0x7f37eb6dfcf7 in boost::execution_monitor::execute(boost::function<int ()> const&) (/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.71.0+0x32cf7)
    #6 0x7f37eb6dfdcd in boost::execution_monitor::vexecute(boost::function<void ()> const&) (/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.71.0+0x32dcd)
    #7 0x7f37eb70d134 in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::function<void ()> const&, unsigned long) (/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.71.0+0x60134)
    #8 0x7f37eb6f05a8  (/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.71.0+0x435a8)
    #9 0x7f37eb6f0b03  (/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.71.0+0x43b03)
    #10 0x7f37eb6f0b03  (/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.71.0+0x43b03)
    #11 0x7f37eb6e7939 in boost::unit_test::framework::run(unsigned long, bool) (/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.71.0+0x3a939)
    #12 0x7f37eb70bfea in boost::unit_test::unit_test_main(bool (*)(), int, char**) (/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.71.0+0x5efea)
    #13 0x558b91dd63a6 in main /usr/include/boost/test/unit_test.hpp:63:12
    #14 0x7f37eae990b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #15 0x558b91d2bbbd in _start (/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/test/test_bitcoin+0x21c5bbd)

SUMMARY: UndefinedBehaviorSanitizer: implicit-unsigned-integer-truncation test/transaction_tests.cpp:237:35 in

glozow · 2020-08-14T16:54:41Z

@laanwj I'm looking into the issue, is there a way to run this sanitizer locally?
Edit: nvm I think I got it, from developer-notes...

glozow · 2020-08-14T16:56:55Z

Not sure this is a good idea. Tests should ideally test one thing only, and failing due to other bugs would be annoying.

@luke-jr I would think of it more as... each test sanity checks itself to make sure it's testing exactly what we want it to test. Passing/failing because the test itself is incorrect would be more annoying imo. All the current tests pass; as more tests are added, I'd think it's a good idea to have this check.

glozow · 2020-08-21T15:00:13Z

Last push fixed the sanitizer bug - just needed a cast. Ready for review :)

glozow · 2020-09-04T00:10:37Z

A little bit quiet here... @laanwj and @practicalswift you both left reviews on the original PR, if you have time I'd appreciate a look here as well :)

laanwj · 2020-10-27T09:35:53Z

Code review ACK, thanks for solving the sanitizer bug.

I would like @sipa @MarcoFalke or someone else close with the verification testing code to look at this and give concept ACK.

benthecarman · 2020-11-14T04:21:16Z

Concept ACK

src/test/transaction_tests.cpp

jnewbery

I've left lots of style suggestions, which should be pretty easy to resolve.

It's very difficult to review this PR, since it's mixing refactors and fixes into the same commits. For example, the first commit is changing the format of the tx_valid.json file and making changes/fixes to the tests. I think it'd be far easier to review those different changes if they were split out into individual commits. The same is true for the other commits in this branch - each one is doing too much, making review more difficult than it needs to ne.

src/test/transaction_tests.cpp

jnewbery · 2020-11-26T17:40:05Z

src/test/transaction_tests.cpp

+    return flags;
+}
+
+unsigned int FillFlags(unsigned int flags)


Would a better interface be to take a reference and update it in place?

maybe not, since we're using it like
CheckTxScripts(... FillFlags(flags) ...)
So doing it in place would mean we need an extra line there

src/test/transaction_tests.cpp

murchandamus

Concept ACK

TBH, I could use a bit more explanation in the fourth commit message "Verify that all validation flags are backward compatible". Could you go a bit more into detail what this is doing and why?

src/test/data/tx_valid.json

src/test/data/tx_invalid.json

src/test/transaction_tests.cpp

murchandamus · 2020-12-02T15:53:57Z

src/test/transaction_tests.cpp

+    for (const std::string& word : words)
+    {
+        ret.push_back(flags & ~mapFlagNames[word]);
+    }
+    return ret;


Maybe it would be better to have a separate function that returns the names of the flags from a given sum of all flags.

glozow · 2020-12-07T15:07:00Z

Thanks for the review @laanwj @jnewbery @benthecarman @xekyo @jonatack :) addressed your comments and split the PR up into more, dedicated commits. CI is green 🟢 ready for review again!

murchandamus

Looks good to me, just the one nit from me at the moment.

src/test/transaction_tests.cpp

glozow · 2021-01-26T22:25:09Z

Attempting to revive this again 😅 🙏
Rebased on master since it's been a while and applied the style suggestions from @jonatack and @xekyo. I had also forgotten to add the original author to the intermediate commits when I split up the changes - my sincerest apologies - that's fixed now.

laanwj · 2021-01-27T07:35:46Z

To test the tests I made a small change to tx_valid.json (basically reverting the initial commits):

 ["An ADD producing a 5-byte result that sets CTxIn::SEQUENCE_LOCKTIME_DISABLE_FLAG"],
-[[["0000000000000000000000000000000000000000000000000000000000000100", 0, "2147483647 65536 ADD CHECKSEQUENCEVERIFY"]],
+[[["0000000000000000000000000000000000000000000000000000000000000100", 0, "2147483647 65536 CHECKSEQUENCEVERIFY"]],

The good thing is that it easily detects this. However, an avalanche of 63 failures for a single error might be a bit overkill 😅 It does report the JSON of the failed test, which is good!

The following change, however

 [[["0000000000000000000000000000000000000000000000000000000000000100", 0, "2147483647 65536 ADD CHECKSEQUENCEVERIFY"]],
-"020000000100010000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000", "NONE"],
+"020000000100010000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000", "P2SH,CHECKSEQUENCEVERIFY"],

raises an assertion

test_bitcoin: …/bitcoin/src/script/interpreter.cpp:2044: bool VerifyScript(const CScript &, const CScript &, const CScriptWitness *, unsigned int, const BaseSignatureChecker &, ScriptError *): Assertion `(flags & SCRIPT_VERIFY_P2SH) != 0' failed.

Then reports 7 failures in different places in the code. It does not report which specific test in the JSON failed.

Not sure if these are a problem, I mean, the tests fail when they're supposed to fail, but just thought I'd report it.

Co-authored-by: Johnson Lau <jl2012@xbt.hk>

- Apply all validation flags by default - Invert the meaning of verifyFlags as flags being excluded Co-authored-by: Johnson Lau <jl2012@xbt.hk>

Co-authored-by: Johnson Lau <jl2012@xbt.hk>

- Reduce the number of validation flags used, to a minimally required set to fail a test Co-authored-by: Johnson Lau <jl2012@xbt.hk>

Co-authored-by: Johnson Lau <jl2012@xbt.hk>

See bitcoin#10699, i.e. adding a flag should always reduce the number of acceptable scripts. Co-authored-by: Johnson Lau <jl2012@xbt.hk>

glozow · 2021-02-02T17:25:37Z

@laanwj Good point, it's not very helpful if VerifyFlags just throws and we don't know which test failed. I've added a check using FillFlags to throw a "Bad test flags" when there's an invalid combination of flags given in tx_valid.json (a260c22).

I tested by adding a P2SH or WITNESS without a CLEANSTACK (these would be inverted) to a test in tx_valid.json, which we know to be invalid/non-backwards-compatible combinations. It should now print which test failed, for example:

test/transaction_tests.cpp:228: error: in "transaction_tests/tx_valid": Bad test flags: [[["0000000000000000000000000000000000000000000000000000000000000100",0,"2147483647 65536 ADD CHECKSEQUENCEVERIFY"]],"020000000100010000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000","P2SH"]
Assertion failed: ((flags & SCRIPT_VERIFY_P2SH) != 0), function VerifyScript, file script/interpreter.cpp, line 2044.

Let me know what you think?

achow101 · 2021-02-22T21:36:14Z

ACK 5786a81

laanwj · 2021-02-23T10:07:55Z

Let me know what you think?

Looks good to me now!

ACK 5786a81

maflcko

Concept ACK, but I don't think this works

src/test/data/tx_invalid.json

maflcko · 2021-02-23T12:10:10Z

src/test/transaction_tests.cpp

+    std::vector<unsigned int> flags_combos;
+    for (unsigned int i = 0; i < mapFlagNames.size(); ++i) {
+        const unsigned int flags_excluding_i = TrimFlags(flags & ~(1U << i));
+        if (flags != flags_excluding_i && std::find(flags_combos.begin(), flags_combos.end(), flags_excluding_i) != flags_combos.end()) {


I don't understand what this is supposed to do. The method just returns an empty vector.

~~I don't think it always returns an empty vector? I'm playing around with it, should only return empty if flags=0.~~

What it's supposed to do: given a set of verify flags flags, and all the flags in existence from mapFlagNames and exclude each one from flags (granted it's a valid combination and doesn't just result in the same flags).

So for example if it's given 10001100 and there are 8 total flags, it'll return [00001100, 10000100, 10001000] if they're all valid combinations. If the input is 0, it should return an empty vector.

Edit: uh oh you're right it seems to be broken...

Suggested change

if (flags != flags_excluding_i && std::find(flags_combos.begin(), flags_combos.end(), flags_excluding_i) != flags_combos.end()) {

if (flags != flags_excluding_i && std::find(flags_combos.begin(), flags_combos.end(), flags_excluding_i) == flags_combos.end()) {

Yeah, it should be this. Should not be found in flags_combos. I'll open a PR to address this

…ction tests and assert backwards compatibility 5786a81 Verify that all validation flags are backward compatible (gzhao408) b10ce9a [test] check verification flags are minimal/maximal (gzhao408) a260c22 [test] Check for invalid flag combinations (gzhao408) a7098a2 [refactor] use CheckTxScripts, TrimFlags, FillFlags (gzhao408) 7a77727 Apply minimal validation flags to tx_invalid tests (gzhao408) 9532591 [test] add BADTX setting for invalid txns that fail CheckTransaction (gzhao408) 4c06ebf [test] fix two witness tests in invalid tests with empty vout (gzhao408) 158a0b2 Apply maximal validation flags to tx_valid tests (gzhao408) 0a76a39 [test] fix CSV test missing OP_ADD (gzhao408) 19db590 [test] remove unnecessary OP_1s from CSV and CLTV tests (gzhao408) Pull request description: This uses the first 4 commits of bitcoin#15045, rebased and added some comments. The diff is quite large already and I want to make it easy to review, so I'm splitting it into 2 PRs (transaction and script). Script one is WIP, I'll link it when I open it. Interpretation of scripts is dependent on the script verification flags passed in. In tests, we should always apply **maximal** verification flags when checking that a transaction is **valid**; any additional flags should invalidate the transaction. A transaction should not be valid because we forgot to include a flag, and we should apply all flags by default. We should apply **minimal** verification flags when asserting that a transaction is **invalid**; if verification flags are applied, removing any one of them should mean the transaction is valid. New verify flags must be backwards compatible; tests should check backwards compatibility and apply the new flags by default. All `tx_invalid` tests should continue to be invalid with the exact same verify flags. All `tx_valid` tests that don't pass with new flags should _explicitly_ indicate that the flags need to be excluded, and fail otherwise. 1. Flip the meaning of `verifyFlags` in tx_valid.json to mean _excluded_ verification flags instead of included flags. Edit the test data accordingly. 2. Trim unneeded flags from tx_invalid.json. 3. Add check to verify that tx_valid tests have maximal flags and tx_invalid tests have minimal flags. 4. Add checks to verify that flags are soft forks (bitcoin#10699) i.e. adding any flag should only decrease the number of acceptable scripts. Test by adding/removing random flags. ACKs for top commit: achow101: ACK 5786a81 laanwj: ACK 5786a81 Tree-SHA512: 19195d8cf3299e62f47dd3443ae4a95430c5c9d497993a18ab80de9e24b1869787af972774993bf05717784879bc4592fdabaae0fddebd437963d8f3c96d9a73

df8f2a1 test: Replace accidentally placed bit-OR with logical-OR (Hennadii Stepanov) Pull request description: This PR is a follow up of #19698. ACKs for top commit: glozow: utACK df8f2a1 Tree-SHA512: 36aba3e952850deafe78dd39775a568e89e867c8a352f511f152bce7062f614f5bb4f23266dbb33da5292c9ee6da5ccefce117e3168621c71d2140c8e7f58460

…gical-OR df8f2a1 test: Replace accidentally placed bit-OR with logical-OR (Hennadii Stepanov) Pull request description: This PR is a follow up of bitcoin#19698. ACKs for top commit: glozow: utACK bitcoin@df8f2a1 Tree-SHA512: 36aba3e952850deafe78dd39775a568e89e867c8a352f511f152bce7062f614f5bb4f23266dbb33da5292c9ee6da5ccefce117e3168621c71d2140c8e7f58460

b109bde [test] check that mapFlagNames is up to date (glozow) 5d3ced7 [test] remove unnecessary OP_1s from invalid tests (glozow) 5aee73d [test] minor improvements / followups (glozow) 8a365df [test] fix bug in ExcludeIndividualFlags (glozow) 8cac292 [test] remove invalid test from tx_valid.json (glozow) Pull request description: This is a followup to #19698. - There was a bug in the `ExcludeIndividualFlags` function which is fixed here. - Fixing this bug also showed that there is a test that's supposed to fail (already existing in tx_invalid.json) in tx_valid.json, so I removed it. Other than that, the tests should all pass. - Also implements a few suggestions I received offline: removing the `OP_1`s from the invalid tests (similar to 19db590), comments, and style. - A few other small fixes, like adding asserts, putting all the flags in `mapFlagNames`, better error messages ACKs for top commit: jnewbery: utACK b109bde Tree-SHA512: 7233a8c0f1ae1172fac8000ea6e05384ecf79074c39948d118464868505c7f02f17e96503c81bd05c07adb2087648a5d93d9899e16fdefa6b7efcb51319444a9

b109bde [test] check that mapFlagNames is up to date (glozow) 5d3ced7 [test] remove unnecessary OP_1s from invalid tests (glozow) 5aee73d [test] minor improvements / followups (glozow) 8a365df [test] fix bug in ExcludeIndividualFlags (glozow) 8cac292 [test] remove invalid test from tx_valid.json (glozow) Pull request description: This is a followup to bitcoin#19698. - There was a bug in the `ExcludeIndividualFlags` function which is fixed here. - Fixing this bug also showed that there is a test that's supposed to fail (already existing in tx_invalid.json) in tx_valid.json, so I removed it. Other than that, the tests should all pass. - Also implements a few suggestions I received offline: removing the `OP_1`s from the invalid tests (similar to bitcoin@19db590), comments, and style. - A few other small fixes, like adding asserts, putting all the flags in `mapFlagNames`, better error messages ACKs for top commit: jnewbery: utACK b109bde Tree-SHA512: 7233a8c0f1ae1172fac8000ea6e05384ecf79074c39948d118464868505c7f02f17e96503c81bd05c07adb2087648a5d93d9899e16fdefa6b7efcb51319444a9

DrahtBot added the Tests label Aug 11, 2020

glozow force-pushed the test-verify-flags branch from 8f801ab to 110239f Compare August 20, 2020 23:47

benthecarman reviewed Nov 14, 2020

View reviewed changes

src/test/transaction_tests.cpp Outdated Show resolved Hide resolved

src/test/transaction_tests.cpp Outdated Show resolved Hide resolved

jnewbery reviewed Nov 26, 2020

View reviewed changes

murchandamus reviewed Dec 2, 2020

View reviewed changes

glozow force-pushed the test-verify-flags branch from 110239f to 398151b Compare December 7, 2020 01:57

murchandamus reviewed Dec 7, 2020

View reviewed changes

src/test/transaction_tests.cpp Outdated Show resolved Hide resolved

jonatack reviewed Dec 7, 2020

View reviewed changes

src/test/transaction_tests.cpp Outdated Show resolved Hide resolved

glozow force-pushed the test-verify-flags branch from 398151b to 197c03c Compare January 26, 2021 22:20

glozow and others added 10 commits February 2, 2021 08:58

[test] remove unnecessary OP_1s from CSV and CLTV tests

19db590

Co-authored-by: Johnson Lau <jl2012@xbt.hk>

[test] fix CSV test missing OP_ADD

0a76a39

Co-authored-by: Johnson Lau <jl2012@xbt.hk>

Apply maximal validation flags to tx_valid tests

158a0b2

- Apply all validation flags by default - Invert the meaning of verifyFlags as flags being excluded Co-authored-by: Johnson Lau <jl2012@xbt.hk>

[test] fix two witness tests in invalid tests with empty vout

4c06ebf

Co-authored-by: Johnson Lau <jl2012@xbt.hk>

[test] add BADTX setting for invalid txns that fail CheckTransaction

9532591

Co-authored-by: Johnson Lau <jl2012@xbt.hk>

Apply minimal validation flags to tx_invalid tests

7a77727

- Reduce the number of validation flags used, to a minimally required set to fail a test Co-authored-by: Johnson Lau <jl2012@xbt.hk>

[refactor] use CheckTxScripts, TrimFlags, FillFlags

a7098a2

Co-authored-by: Johnson Lau <jl2012@xbt.hk>

[test] Check for invalid flag combinations

a260c22

[test] check verification flags are minimal/maximal

b10ce9a

Co-authored-by: Johnson Lau <jl2012@xbt.hk>

Verify that all validation flags are backward compatible

5786a81

See bitcoin#10699, i.e. adding a flag should always reduce the number of acceptable scripts. Co-authored-by: Johnson Lau <jl2012@xbt.hk>

glozow force-pushed the test-verify-flags branch from 197c03c to 5786a81 Compare February 2, 2021 17:20

laanwj merged commit c263c3d into bitcoin:master Feb 23, 2021

maflcko reviewed Feb 23, 2021

View reviewed changes

glozow deleted the test-verify-flags branch February 23, 2021 15:56

glozow mentioned this pull request Feb 23, 2021

test: bug fix in transaction_tests #21280

Merged

hebasto mentioned this pull request Feb 24, 2021

test: Replace accidentally placed bit-OR with logical-OR #21293

Merged

bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022

	if (flags != flags_excluding_i && std::find(flags_combos.begin(), flags_combos.end(), flags_excluding_i) != flags_combos.end()) {
	if (flags != flags_excluding_i && std::find(flags_combos.begin(), flags_combos.end(), flags_excluding_i) == flags_combos.end()) {

test: apply strict verification flags for transaction tests and assert backwards compatibility #19698

test: apply strict verification flags for transaction tests and assert backwards compatibility #19698

Uh oh!

Conversation

glozow commented Aug 11, 2020

Uh oh!

luke-jr commented Aug 11, 2020

Uh oh!

laanwj commented Aug 12, 2020

Uh oh!

glozow commented Aug 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glozow commented Aug 14, 2020

Uh oh!

glozow commented Aug 21, 2020

Uh oh!

glozow commented Sep 4, 2020

Uh oh!

laanwj commented Oct 27, 2020

Uh oh!

benthecarman commented Nov 14, 2020

Uh oh!

Uh oh!

Uh oh!

jnewbery left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jnewbery Nov 26, 2020

Choose a reason for hiding this comment

Uh oh!

glozow Dec 6, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

murchandamus left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

murchandamus Dec 2, 2020

Choose a reason for hiding this comment

Uh oh!

glozow commented Dec 7, 2020

Uh oh!

murchandamus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

glozow commented Jan 26, 2021

Uh oh!

laanwj commented Jan 27, 2021

Uh oh!

glozow commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

achow101 commented Feb 22, 2021

Uh oh!

laanwj commented Feb 23, 2021

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maflcko Feb 23, 2021

glozow commented Aug 14, 2020 •

edited

Loading

murchandamus left a comment •

edited

Loading

glozow commented Feb 2, 2021 •

edited

Loading

glozow Feb 23, 2021 •

edited

Loading