-
Notifications
You must be signed in to change notification settings - Fork 37.7k
[Tests] Compute the Power Set of all flags instead of one by one exclusion #22948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
125fb1c
to
7812cff
Compare
If anyone doesn't like the submask code, I could swap it for this algorithm (which is more efficient but not-obvious) https://cp-algorithms.com/algebra/all-submasks.html#toc-tgt-0 |
…usion in transaction_tests.cpp
7812cff
to
885bf41
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK. Here are the results for me before/after. It might be interesting to see the result with https://cp-algorithms.com/algebra/all-submasks.html if you're motivated to try it.
EDIT: added the results for both tests (tx_valid and tx_invalid).
ExcludeIndividualFlags
test/transaction_tests.cpp(187): Entering test suite "transaction_tests"
test/transaction_tests.cpp(189): Entering test case "tx_valid"
6
10
12
6
10
12
14
22
26
28
14
22
26
28
14
22
26
28
2
8
0
65536
6
10
12
0
0
0
0
0
0
0
0
0
0
2
16384
0
0
0
8
65536
8
65536
8
65536
8
65536
8
65536
0
40
65544
65568
65576
40
65544
65568
65576
32
65536
65568
40
65544
65568
65576
8
65536
8
65536
0
0
0
0
64
0
32
6
10
12
0
0
0
64
0
32
0
0
0
64
0
0
64
0
64
8
65536
0
8
65536
0
test/transaction_tests.cpp(189): Leaving test case "tx_valid"; testing time: 918607us
test/transaction_tests.cpp(280): Entering test case "tx_invalid"
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
512
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1024
0
0
2049
4096
4097
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
2049
4096
4097
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
1
65536
0
test/transaction_tests.cpp(280): Leaving test case "tx_invalid"; testing time: 270583us
GenerateAllSubMasks
test/transaction_tests.cpp(204): Entering test suite "transaction_tests"
test/transaction_tests.cpp(206): Entering test case "tx_valid"
0
2
4
6
8
10
12
0
2
4
6
8
10
12
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
0
2
8
0
65536
0
2
4
6
8
10
12
0
0
0
0
0
0
0
0
0
0
0
2
16384
0
0
0
0
8
65536
0
8
65536
0
8
65536
0
8
65536
0
8
65536
0
0
8
32
40
65536
65544
65568
65576
0
8
32
40
65536
65544
65568
65576
0
32
65536
65568
0
8
32
40
65536
65544
65568
65576
0
8
65536
0
8
65536
0
0
0
0
64
0
32
0
2
4
6
8
10
12
0
0
0
64
0
32
0
0
0
64
0
0
64
0
64
0
8
65536
0
0
8
65536
0
test/transaction_tests.cpp(206): Leaving test case "tx_valid"; testing time: 1159591us
test/transaction_tests.cpp(297): Entering test case "tx_invalid"
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
512
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1024
0
0
0
1
2049
4096
4097
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
2049
4096
4097
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
65536
0
test/transaction_tests.cpp(297): Leaving test case "tx_invalid"; testing time: 539616us
src/test/transaction_tests.cpp
Outdated
for (const unsigned int submask : all_flag_combos) { | ||
flags_combos.insert(TrimFlags(submask)); | ||
} | ||
flags_combos.erase(flags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, clang-format
@@ -174,7 +174,7 @@ unsigned int FillFlags(unsigned int flags)
std::set<unsigned int> GenerateAllSubMasks(unsigned int flags)
{
std::vector<unsigned int> set_flags;
- const size_t n_flags = sizeof(unsigned int)*8;
+ const size_t n_flags = sizeof(unsigned int) * 8;
set_flags.reserve(n_flags);
@@ -184,10 +184,10 @@ std::set<unsigned int> GenerateAllSubMasks(unsigned int flags)
}
const size_t combos = 1 << set_flags.size();
// skip last all set flags
- std::vector<unsigned int> all_flag_combos(combos-1);
- for (size_t i = 0; i < combos-1; ++i){
- for(size_t j = 0; j < set_flags.size(); ++j) {
- if (i & (1<<j)) {
+ std::vector<unsigned int> all_flag_combos(combos - 1);
+ for (size_t i = 0; i < combos - 1; ++i) {
+ for (size_t j = 0; j < set_flags.size(); ++j) {
+ if (i & (1 << j)) {
all_flag_combos[i] |= set_flags[j];
src/test/transaction_tests.cpp
Outdated
const size_t n_flags = sizeof(unsigned int)*8; | ||
set_flags.reserve(n_flags); | ||
for (size_t i = 0; i < n_flags; ++i) { | ||
const unsigned int flag = flags & (1 << i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, can use braced initialization on these for type safety
- const size_t n_flags = sizeof(unsigned int)*8;
+ const size_t n_flags{sizeof(unsigned int) * 8};
set_flags.reserve(n_flags);
for (size_t i = 0; i < n_flags; ++i) {
- const unsigned int flag = flags & (1 << i);
+ const unsigned int flag{flags & (1 << i)};
src/test/transaction_tests.cpp
Outdated
const size_t combos = 1 << set_flags.size(); | ||
// skip last all set flags | ||
std::vector<unsigned int> all_flag_combos(combos-1); | ||
for (size_t i = 0; i < combos-1; ++i){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe hoist subtracting one to the declaration of combos
- const size_t combos = 1 << set_flags.size();
+ const size_t combos = (1 << set_flags.size()) - 1;
// skip last all set flags
- std::vector<unsigned int> all_flag_combos(combos-1);
- for (size_t i = 0; i < combos-1; ++i){
+ std::vector<unsigned int> all_flag_combos(combos);
+ for (size_t i = 0; i < combos; ++i) {
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
I added the textbook algorithm for this: feel free to test that the output is the same then I can squash. |
@JeremyRubin Nice. The textbook algorithm produces the same values in the tx_invalid test but fewer values in the tx_valid test. Among the values that it does not return compared to your algo, only 2 or 3 of them are unique. WDYT? FancyPants version
|
// Check that flags are maximal: transaction should fail if any unset flags are set. | ||
for (auto flags_excluding_one : ExcludeIndividualFlags(verify_flags)) { | ||
if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~flags_excluding_one, txdata, strTest, /* expect_valid */ false)) { | ||
for (auto submask_flags : GenerateAllSubMasks(verify_flags)) { | ||
if (!CheckTxScripts(tx, mapprevOutScriptPubKeys, mapprevOutValues, ~submask_flags, txdata, strTest, /* expect_valid */ false)) { | ||
BOOST_ERROR("Too many flags unset: " << strTest); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the intent of this code correctly, the idea is that given a set of chosen flags, we want to check that every possible superset of flags doesn't succeed. If that's so, I believe there's a problem with the current FlipAll( GenerateAllSubMasks( flags ) )
logic for computing all possible supersets.
Eg, assuming 3-bit words, for flags = 100
, we'll generate all flags
sub masks {000}
, and then flip them {111}
. However, what we really want is to generate all the ~flags = 011
sub masks {010, 001, 000}
, and then flip them {101, 110, 111}
.
In short, there's a missing bit flip before computing all the sub masks. Ie. it should be
FlipAll( GenerateAllSubMasks( ~flags ) )
With that said, there's still the issue that flipping flag bits doesn't interact well with the TrimFlags
operation done inside GenerateAllSubMasks
.
Eg. if the only flag bits wereVERIFY_P2SH
and VERIFY_WITNESS
(in that order), for flags = 00
we'd generate all the ~flags = 11
sub masks {10, 01, 00}
, trim them {10, 00, 00}
(01
is mapped to 00
), remove duplicates {10, 00}
, and finally flip them {01, 11}
. However, we ended up with an invalid combination (01
) when the correct result would have been {10, 11}
.
More generally, after fixing the initial missing flip, the code would look something like
FlipAll( Unique( TrimFlags( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) )
when what we really want is something like:
for flag_superset in FlipAll( GenerateAllSubMasksWithoutTrimming( ~flags ) ) ) ):
if flag_superset == TrimFlags( flag_superset ):
# use flag_superset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah you're totally right; i've been playing with it a bit as well and noticed that it seems to not be quite right (thanks @jonatack for pointing it out).
The logic is definitely off, and I think it may be intractible in this case since GenerateAllSubMasksWithoutTrimming(~flags) ends up being pretty big... with 20 it might be ok, but eventually there would be too many flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Maybe bound the amount of flipped bits? That way you can test some flag interactions, but bound the exponential growth. The bounding capability would also help to put a safeguard against exponential explosion in the tx_invalid test if there are too many set bits in the initial mask.
Closing this for now -- needs more research/discussion on how to accomplish actually using the powerset properly, see #22948 (comment) |
Currently, at the end of a test vector (valid or invalid) we iterate the flags one by one and unset them and check that it makes the transaction succeed or fail.
This is called 'minimality' in that we want to check that each flag is sufficient on it's own. However, I think we should be instead doing the 2**(specified) combinations of flag, to ensure that for all combinations of flags only the one we have specified is either valid or invalid.
Otherwise, we might have e.g. subsets of flags that have interactions we're not detecting here.
Interestingly, the asymptotic runtime here should be better on average (since we don't usually set that many flags, v.s. 32 iterations on the one by one checker), but the worst case is 2**32 flag combos. It's up to the test writers to not abuse this check.
Contrived example demonstrating the problem:
having set "a,b,c", the one-by-one checker for "a,b,c," expects failure would check ["a,b", "a,c", "b,c"] and see no failures since the parity of each is 2. However, ["a", "b", "c"] would fail since the parity is 1.
I'm not aware of any code that is written in this specific manner, but similar circumstance might arise naturally in the code.