[Tests] Compute the Power Set of all flags instead of one by one exclusion #22948

JeremyRubin · 2021-09-10T23:19:00Z

Currently, at the end of a test vector (valid or invalid) we iterate the flags one by one and unset them and check that it makes the transaction succeed or fail.

This is called 'minimality' in that we want to check that each flag is sufficient on it's own. However, I think we should be instead doing the 2**(specified) combinations of flag, to ensure that for all combinations of flags only the one we have specified is either valid or invalid.

Otherwise, we might have e.g. subsets of flags that have interactions we're not detecting here.

Interestingly, the asymptotic runtime here should be better on average (since we don't usually set that many flags, v.s. 32 iterations on the one by one checker), but the worst case is 2**32 flag combos. It's up to the test writers to not abuse this check.

Contrived example demonstrating the problem:

const auto sum = flag_a_set + flag_b_set +flag_c_set ;
// check parity is even
if (sum & 1) throw "Oops";

having set "a,b,c", the one-by-one checker for "a,b,c," expects failure would check ["a,b", "a,c", "b,c"] and see no failures since the parity of each is 2. However, ["a", "b", "c"] would fail since the parity is 1.

I'm not aware of any code that is written in this specific manner, but similar circumstance might arise naturally in the code.

src/test/transaction_tests.cpp

JeremyRubin · 2021-09-11T14:53:11Z

If anyone doesn't like the submask code, I could swap it for this algorithm (which is more efficient but not-obvious) https://cp-algorithms.com/algebra/all-submasks.html#toc-tgt-0

src/test/transaction_tests.cpp

…usion in transaction_tests.cpp

jonatack

Concept ACK. Here are the results for me before/after. It might be interesting to see the result with https://cp-algorithms.com/algebra/all-submasks.html if you're motivated to try it.

EDIT: added the results for both tests (tx_valid and tx_invalid).

ExcludeIndividualFlags

test/transaction_tests.cpp(187): Entering test suite "transaction_tests"
test/transaction_tests.cpp(189): Entering test case "tx_valid"
6
10
12
6
10
12
14
22
26
28
14
22
26
28
14
22
26
28
2
8
0
65536
6
10
12
0
0
0
0
0
0
0
0
0
0
2
16384
0
0
0
8
65536
8
65536
8
65536
8
65536
8
65536
0
40
65544
65568
65576
40
65544
65568
65576
32
65536
65568
40
65544
65568
65576
8
65536
8
65536
0
0
0
0
64
0
32
6
10
12
0
0
0
64
0
32
0
0
0
64
0
0
64
0
64
8
65536
0
8
65536
0
test/transaction_tests.cpp(189): Leaving test case "tx_valid"; testing time: 918607us
test/transaction_tests.cpp(280): Entering test case "tx_invalid"
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
512
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1024
0
0
2049
4096
4097
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
2049
4096
4097
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
1
65536
0
test/transaction_tests.cpp(280): Leaving test case "tx_invalid"; testing time: 270583us

GenerateAllSubMasks

test/transaction_tests.cpp(204): Entering test suite "transaction_tests"
test/transaction_tests.cpp(206): Entering test case "tx_valid"
0
2
4
6
8
10
12
0
2
4
6
8
10
12
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
8
0
65536
0
2
4
6
8
10
12
0
0
0
0
0
0
0
0
0
0
0
2
16384
0
0
0
0
8
65536
0
8
65536
0
8
65536
0
8
65536
0
8
65536
0
0
8
32
40
65536
65544
65568
65576
0
8
32
40
65536
65544
65568
65576
0
32
65536
65568
0
8
32
40
65536
65544
65568
65576
0
8
65536
0
8
65536
0
0
0
0
64
0
32
0
2
4
6
8
10
12
0
0
0
64
0
32
0
0
0
64
0
0
64
0
64
0
8
65536
0
0
8
65536
0
test/transaction_tests.cpp(206): Leaving test case "tx_valid"; testing time: 1159591us
test/transaction_tests.cpp(297): Entering test case "tx_invalid"
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
512
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1024
0
0
0
1
2049
4096
4097
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
2049
4096
4097
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
65536
0
test/transaction_tests.cpp(297): Leaving test case "tx_invalid"; testing time: 539616us

jonatack · 2021-09-12T14:53:17Z

src/test/transaction_tests.cpp

+    for (const unsigned int submask : all_flag_combos) {
+        flags_combos.insert(TrimFlags(submask));
+    }
+    flags_combos.erase(flags);


nit, clang-format

@@ -174,7 +174,7 @@ unsigned int FillFlags(unsigned int flags) std::set<unsigned int> GenerateAllSubMasks(unsigned int flags) { std::vector<unsigned int> set_flags; - const size_t n_flags = sizeof(unsigned int)*8; + const size_t n_flags = sizeof(unsigned int) * 8; set_flags.reserve(n_flags); @@ -184,10 +184,10 @@ std::set<unsigned int> GenerateAllSubMasks(unsigned int flags) } const size_t combos = 1 << set_flags.size(); // skip last all set flags - std::vector<unsigned int> all_flag_combos(combos-1); - for (size_t i = 0; i < combos-1; ++i){ - for(size_t j = 0; j < set_flags.size(); ++j) { - if (i & (1<<j)) { + std::vector<unsigned int> all_flag_combos(combos - 1); + for (size_t i = 0; i < combos - 1; ++i) { + for (size_t j = 0; j < set_flags.size(); ++j) { + if (i & (1 << j)) { all_flag_combos[i] |= set_flags[j];

jonatack · 2021-09-12T14:58:26Z

src/test/transaction_tests.cpp

+    const size_t n_flags = sizeof(unsigned int)*8;
+    set_flags.reserve(n_flags);
+    for (size_t i = 0; i < n_flags; ++i) {
+        const unsigned int flag = flags & (1 << i);


nit, can use braced initialization on these for type safety

- const size_t n_flags = sizeof(unsigned int)*8; + const size_t n_flags{sizeof(unsigned int) * 8}; set_flags.reserve(n_flags); for (size_t i = 0; i < n_flags; ++i) { - const unsigned int flag = flags & (1 << i); + const unsigned int flag{flags & (1 << i)};

jonatack · 2021-09-12T15:02:30Z

src/test/transaction_tests.cpp

+    const size_t combos = 1 << set_flags.size();
+    // skip last all set flags
+    std::vector<unsigned int> all_flag_combos(combos-1);
+    for (size_t i = 0; i < combos-1; ++i){


maybe hoist subtracting one to the declaration of combos

- const size_t combos = 1 << set_flags.size(); + const size_t combos = (1 << set_flags.size()) - 1; // skip last all set flags - std::vector<unsigned int> all_flag_combos(combos-1); - for (size_t i = 0; i < combos-1; ++i){ + std::vector<unsigned int> all_flag_combos(combos); + for (size_t i = 0; i < combos; ++i) {

DrahtBot · 2021-09-12T15:40:06Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#22954 ([TESTS] Allow tx_invalid.json tests to include flag rules for if_unset: [A,B,C] then_unset: [D] by JeremyRubin)
#22876 ([TESTS] Update Transaction Tests to permit setting a flag as always on and disabling the exhaustive failure test by JeremyRubin)
#21702 (Implement BIP-119 Validation (CheckTemplateVerify) by JeremyRubin)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

JeremyRubin · 2021-09-12T19:06:20Z

I added the textbook algorithm for this: feel free to test that the output is the same then I can squash.

jonatack · 2021-09-13T12:08:43Z

@JeremyRubin Nice. The textbook algorithm produces the same values in the tx_invalid test but fewer values in the tx_valid test. Among the values that it does not return compared to your algo, only 2 or 3 of them are unique. WDYT?

FancyPants version

test/transaction_tests.cpp(194): Entering test case "tx_valid"
0
2
4
6
8
10
12
0
2
4
6
8
10
12
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
8
0
0
2
4
6
8
10
12
0
0
0
0
0
0
0
2
16384
0
0
0
0
8
65536
0
8
65536
0
8
65536
0
8
65536
0
8
65536
0
0
8
32
40
65536
65544
65568
0
8
32
40
65536
65544
65568
0
32
65536
0
8
32
40
65536
65544
65568
0
8
65536
0
8
65536
0
0
0
2
4
6
8
10
12
0
0
0
0
0
0
0
0
8
65536
0
0
8
65536
0
test/transaction_tests.cpp(194): Leaving test case "tx_valid"; testing time: 597522us
test/transaction_tests.cpp(284): Entering test case "tx_invalid"
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
512
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1024
0
0
0
1
2049
4096
4097
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
2049
4096
4097
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
65536
0
test/transaction_tests.cpp(284): Leaving test case "tx_invalid"; testing time: 170187us

esneider · 2021-09-14T04:48:09Z

src/test/transaction_tests.cpp

            // Check that flags are maximal: transaction should fail if any unset flags are set.
-            for (auto flags_excluding_one : ExcludeIndividualFlags(verify_flags)) {
-                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~flags_excluding_one, txdata, strTest, /* expect_valid */ false)) {
+            for (auto submask_flags : GenerateAllSubMasks(verify_flags)) {
+                if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~submask_flags, txdata, strTest, /* expect_valid */ false)) {
                    BOOST_ERROR("Too many flags unset: " << strTest);
                }
            }


If I understand the intent of this code correctly, the idea is that given a set of chosen flags, we want to check that every possible superset of flags doesn't succeed. If that's so, I believe there's a problem with the current FlipAll( GenerateAllSubMasks( flags ) ) logic for computing all possible supersets.

Eg, assuming 3-bit words, for flags = 100, we'll generate all flags sub masks {000}, and then flip them {111}. However, what we really want is to generate all the ~flags = 011 sub masks {010, 001, 000}, and then flip them {101, 110, 111}.

In short, there's a missing bit flip before computing all the sub masks. Ie. it should be

FlipAll( GenerateAllSubMasks( ~flags ) )

With that said, there's still the issue that flipping flag bits doesn't interact well with the TrimFlags operation done inside GenerateAllSubMasks.

Eg. if the only flag bits wereVERIFY_P2SH and VERIFY_WITNESS (in that order), for flags = 00 we'd generate all the ~flags = 11 sub masks {10, 01, 00}, trim them {10, 00, 00} (01 is mapped to 00), remove duplicates {10, 00}, and finally flip them {01, 11}. However, we ended up with an invalid combination (01) when the correct result would have been {10, 11}.

More generally, after fixing the initial missing flip, the code would look something like

FlipAll( Unique( TrimFlags( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) )

when what we really want is something like:

for flag_superset in FlipAll( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) ): if flag_superset == TrimFlags( flag_superset ): # use flag_superset

yeah you're totally right; i've been playing with it a bit as well and noticed that it seems to not be quite right (thanks @jonatack for pointing it out).

The logic is definitely off, and I think it may be intractible in this case since GenerateAllSubMasksWithoutTrimming(~flags) ends up being pretty big... with 20 it might be ok, but eventually there would be too many flags.

Makes sense. Maybe bound the amount of flipped bits? That way you can test some flag interactions, but bound the exponential growth. The bounding capability would also help to put a safeguard against exponential explosion in the tx_invalid test if there are too many set bits in the initial mask.

JeremyRubin · 2021-10-22T18:45:45Z

Closing this for now -- needs more research/discussion on how to accomplish actually using the powerset properly, see #22948 (comment)

JeremyRubin commented Sep 10, 2021

View reviewed changes

src/test/transaction_tests.cpp Outdated Show resolved Hide resolved

JeremyRubin force-pushed the flag-powerset branch from 125fb1c to 7812cff Compare September 10, 2021 23:30

JeremyRubin changed the title ~~[Tests] Computer the Power Set of all flags instead of one by one exclusion~~ [Tests] Compute the Power Set of all flags instead of one by one exclusion Sep 10, 2021

fanquake added the Tests label Sep 10, 2021

JeremyRubin mentioned this pull request Sep 11, 2021

[TESTS] Allow tx_invalid.json tests to include flag rules for if_unset: [A,B,C] then_unset: [D] #22954

Closed

esneider reviewed Sep 12, 2021

View reviewed changes

src/test/transaction_tests.cpp Outdated Show resolved Hide resolved

src/test/transaction_tests.cpp Show resolved Hide resolved

src/test/transaction_tests.cpp Outdated Show resolved Hide resolved

src/test/transaction_tests.cpp Outdated Show resolved Hide resolved

[Tests] Compute the Power Set of all flags instead of one by one excl…

885bf41

…usion in transaction_tests.cpp

JeremyRubin force-pushed the flag-powerset branch from 7812cff to 885bf41 Compare September 12, 2021 06:33

jonatack reviewed Sep 12, 2021

View reviewed changes

DrahtBot mentioned this pull request Sep 12, 2021

[TESTS] Update Transaction Tests to permit setting a flag as always on and disabling the exhaustive failure test #22876

Closed

Squash: Use FancyPants Bit Twiddling Algorithm to Generate Submasks

d185f36

DrahtBot mentioned this pull request Sep 12, 2021

Implement BIP-119 Validation (CheckTemplateVerify) #21702

Closed

esneider reviewed Sep 14, 2021

View reviewed changes

JeremyRubin closed this Oct 22, 2021

JeremyRubin added the Up for grabs label Oct 22, 2021

bitcoin locked and limited conversation to collaborators Oct 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Tests] Compute the Power Set of all flags instead of one by one exclusion #22948

[Tests] Compute the Power Set of all flags instead of one by one exclusion #22948

Uh oh!

JeremyRubin commented Sep 10, 2021

Uh oh!

Uh oh!

JeremyRubin commented Sep 11, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonatack left a comment •

edited

Loading

Uh oh!

jonatack Sep 12, 2021

Uh oh!

jonatack Sep 12, 2021

Uh oh!

jonatack Sep 12, 2021

Uh oh!

DrahtBot commented Sep 12, 2021

Uh oh!

JeremyRubin commented Sep 12, 2021

Uh oh!

jonatack commented Sep 13, 2021 •

edited

Loading

Uh oh!

esneider Sep 14, 2021

Uh oh!

JeremyRubin Sep 14, 2021

Uh oh!

esneider Sep 15, 2021 •

edited

Loading

Uh oh!

JeremyRubin commented Oct 22, 2021

Uh oh!

Uh oh!

[Tests] Compute the Power Set of all flags instead of one by one exclusion #22948

[Tests] Compute the Power Set of all flags instead of one by one exclusion #22948

Uh oh!

Conversation

JeremyRubin commented Sep 10, 2021

Uh oh!

Uh oh!

JeremyRubin commented Sep 11, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonatack left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonatack Sep 12, 2021

Choose a reason for hiding this comment

Uh oh!

jonatack Sep 12, 2021

Choose a reason for hiding this comment

Uh oh!

jonatack Sep 12, 2021

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Sep 12, 2021

Conflicts

Uh oh!

JeremyRubin commented Sep 12, 2021

Uh oh!

jonatack commented Sep 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

esneider Sep 14, 2021

Choose a reason for hiding this comment

Uh oh!

JeremyRubin Sep 14, 2021

Choose a reason for hiding this comment

Uh oh!

esneider Sep 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JeremyRubin commented Oct 22, 2021

Uh oh!

Uh oh!

jonatack left a comment •

edited

Loading

jonatack commented Sep 13, 2021 •

edited

Loading

esneider Sep 15, 2021 •

edited

Loading