-
Notifications
You must be signed in to change notification settings - Fork 37.7k
test: apply strict verification flags for transaction tests and assert backwards compatibility #19698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Not sure this is a good idea. Tests should ideally test one thing only, and failing due to other bugs would be annoying. |
Looks like one of the sanitizers finds an integer conversion/truncation problem in the changed code:
|
@laanwj I'm looking into the issue, is there a way to run this sanitizer locally? |
@luke-jr I would think of it more as... each test sanity checks itself to make sure it's testing exactly what we want it to test. Passing/failing because the test itself is incorrect would be more annoying imo. All the current tests pass; as more tests are added, I'd think it's a good idea to have this check. |
8f801ab
to
110239f
Compare
Last push fixed the sanitizer bug - just needed a cast. Ready for review :) |
A little bit quiet here... @laanwj and @practicalswift you both left reviews on the original PR, if you have time I'd appreciate a look here as well :) |
Code review ACK, thanks for solving the sanitizer bug. I would like @sipa @MarcoFalke or someone else close with the verification testing code to look at this and give concept ACK. |
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left lots of style suggestions, which should be pretty easy to resolve.
It's very difficult to review this PR, since it's mixing refactors and fixes into the same commits. For example, the first commit is changing the format of the tx_valid.json file and making changes/fixes to the tests. I think it'd be far easier to review those different changes if they were split out into individual commits. The same is true for the other commits in this branch - each one is doing too much, making review more difficult than it needs to ne.
return flags; | ||
} | ||
|
||
unsigned int FillFlags(unsigned int flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a better interface be to take a reference and update it in place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not, since we're using it like
CheckTxScripts(... FillFlags(flags) ...)
So doing it in place would mean we need an extra line there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
TBH, I could use a bit more explanation in the fourth commit message "Verify that all validation flags are backward compatible". Could you go a bit more into detail what this is doing and why?
src/test/transaction_tests.cpp
Outdated
for (const std::string& word : words) | ||
{ | ||
ret.push_back(flags & ~mapFlagNames[word]); | ||
} | ||
return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to have a separate function that returns the names of the flags from a given sum of all flags.
110239f
to
398151b
Compare
Thanks for the review @laanwj @jnewbery @benthecarman @xekyo @jonatack :) addressed your comments and split the PR up into more, dedicated commits. CI is green 🟢 ready for review again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just the one nit from me at the moment.
398151b
to
197c03c
Compare
To test the tests I made a small change to ["An ADD producing a 5-byte result that sets CTxIn::SEQUENCE_LOCKTIME_DISABLE_FLAG"],
-[[["0000000000000000000000000000000000000000000000000000000000000100", 0, "2147483647 65536 ADD CHECKSEQUENCEVERIFY"]],
+[[["0000000000000000000000000000000000000000000000000000000000000100", 0, "2147483647 65536 CHECKSEQUENCEVERIFY"]], The good thing is that it easily detects this. However, an avalanche of 63 failures for a single error might be a bit overkill 😅 It does report the JSON of the failed test, which is good! The following change, however [[["0000000000000000000000000000000000000000000000000000000000000100", 0, "2147483647 65536 ADD CHECKSEQUENCEVERIFY"]],
-"020000000100010000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000", "NONE"],
+"020000000100010000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000", "P2SH,CHECKSEQUENCEVERIFY"], raises an assertion
Then reports 7 failures in different places in the code. It does not report which specific test in the JSON failed. Not sure if these are a problem, I mean, the tests fail when they're supposed to fail, but just thought I'd report it. |
Co-authored-by: Johnson Lau <jl2012@xbt.hk>
Co-authored-by: Johnson Lau <jl2012@xbt.hk>
- Apply all validation flags by default - Invert the meaning of verifyFlags as flags being excluded Co-authored-by: Johnson Lau <jl2012@xbt.hk>
Co-authored-by: Johnson Lau <jl2012@xbt.hk>
Co-authored-by: Johnson Lau <jl2012@xbt.hk>
- Reduce the number of validation flags used, to a minimally required set to fail a test Co-authored-by: Johnson Lau <jl2012@xbt.hk>
Co-authored-by: Johnson Lau <jl2012@xbt.hk>
Co-authored-by: Johnson Lau <jl2012@xbt.hk>
See bitcoin#10699, i.e. adding a flag should always reduce the number of acceptable scripts. Co-authored-by: Johnson Lau <jl2012@xbt.hk>
197c03c
to
5786a81
Compare
@laanwj Good point, it's not very helpful if I tested by adding a
Let me know what you think? |
ACK 5786a81 |
Looks good to me now! ACK 5786a81 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, but I don't think this works
std::vector<unsigned int> flags_combos; | ||
for (unsigned int i = 0; i < mapFlagNames.size(); ++i) { | ||
const unsigned int flags_excluding_i = TrimFlags(flags & ~(1U << i)); | ||
if (flags != flags_excluding_i && std::find(flags_combos.begin(), flags_combos.end(), flags_excluding_i) != flags_combos.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what this is supposed to do. The method just returns an empty vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it always returns an empty vector? I'm playing around with it, should only return empty if flags=0
.
What it's supposed to do: given a set of verify flags flags
, and all the flags in existence from mapFlagNames
and exclude each one from flags
(granted it's a valid combination and doesn't just result in the same flags).
So for example if it's given 10001100 and there are 8 total flags, it'll return [00001100, 10000100, 10001000] if they're all valid combinations. If the input is 0, it should return an empty vector.
Edit: uh oh you're right it seems to be broken...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (flags != flags_excluding_i && std::find(flags_combos.begin(), flags_combos.end(), flags_excluding_i) != flags_combos.end()) { | |
if (flags != flags_excluding_i && std::find(flags_combos.begin(), flags_combos.end(), flags_excluding_i) == flags_combos.end()) { |
Yeah, it should be this. Should not be found in flags_combos
. I'll open a PR to address this
…ction tests and assert backwards compatibility 5786a81 Verify that all validation flags are backward compatible (gzhao408) b10ce9a [test] check verification flags are minimal/maximal (gzhao408) a260c22 [test] Check for invalid flag combinations (gzhao408) a7098a2 [refactor] use CheckTxScripts, TrimFlags, FillFlags (gzhao408) 7a77727 Apply minimal validation flags to tx_invalid tests (gzhao408) 9532591 [test] add BADTX setting for invalid txns that fail CheckTransaction (gzhao408) 4c06ebf [test] fix two witness tests in invalid tests with empty vout (gzhao408) 158a0b2 Apply maximal validation flags to tx_valid tests (gzhao408) 0a76a39 [test] fix CSV test missing OP_ADD (gzhao408) 19db590 [test] remove unnecessary OP_1s from CSV and CLTV tests (gzhao408) Pull request description: This uses the first 4 commits of bitcoin#15045, rebased and added some comments. The diff is quite large already and I want to make it easy to review, so I'm splitting it into 2 PRs (transaction and script). Script one is WIP, I'll link it when I open it. Interpretation of scripts is dependent on the script verification flags passed in. In tests, we should always apply **maximal** verification flags when checking that a transaction is **valid**; any additional flags should invalidate the transaction. A transaction should not be valid because we forgot to include a flag, and we should apply all flags by default. We should apply **minimal** verification flags when asserting that a transaction is **invalid**; if verification flags are applied, removing any one of them should mean the transaction is valid. New verify flags must be backwards compatible; tests should check backwards compatibility and apply the new flags by default. All `tx_invalid` tests should continue to be invalid with the exact same verify flags. All `tx_valid` tests that don't pass with new flags should _explicitly_ indicate that the flags need to be excluded, and fail otherwise. 1. Flip the meaning of `verifyFlags` in tx_valid.json to mean _excluded_ verification flags instead of included flags. Edit the test data accordingly. 2. Trim unneeded flags from tx_invalid.json. 3. Add check to verify that tx_valid tests have maximal flags and tx_invalid tests have minimal flags. 4. Add checks to verify that flags are soft forks (bitcoin#10699) i.e. adding any flag should only decrease the number of acceptable scripts. Test by adding/removing random flags. ACKs for top commit: achow101: ACK 5786a81 laanwj: ACK 5786a81 Tree-SHA512: 19195d8cf3299e62f47dd3443ae4a95430c5c9d497993a18ab80de9e24b1869787af972774993bf05717784879bc4592fdabaae0fddebd437963d8f3c96d9a73
df8f2a1 test: Replace accidentally placed bit-OR with logical-OR (Hennadii Stepanov) Pull request description: This PR is a follow up of #19698. ACKs for top commit: glozow: utACK df8f2a1 Tree-SHA512: 36aba3e952850deafe78dd39775a568e89e867c8a352f511f152bce7062f614f5bb4f23266dbb33da5292c9ee6da5ccefce117e3168621c71d2140c8e7f58460
…gical-OR df8f2a1 test: Replace accidentally placed bit-OR with logical-OR (Hennadii Stepanov) Pull request description: This PR is a follow up of bitcoin#19698. ACKs for top commit: glozow: utACK bitcoin@df8f2a1 Tree-SHA512: 36aba3e952850deafe78dd39775a568e89e867c8a352f511f152bce7062f614f5bb4f23266dbb33da5292c9ee6da5ccefce117e3168621c71d2140c8e7f58460
b109bde [test] check that mapFlagNames is up to date (glozow) 5d3ced7 [test] remove unnecessary OP_1s from invalid tests (glozow) 5aee73d [test] minor improvements / followups (glozow) 8a365df [test] fix bug in ExcludeIndividualFlags (glozow) 8cac292 [test] remove invalid test from tx_valid.json (glozow) Pull request description: This is a followup to #19698. - There was a bug in the `ExcludeIndividualFlags` function which is fixed here. - Fixing this bug also showed that there is a test that's supposed to fail (already existing in tx_invalid.json) in tx_valid.json, so I removed it. Other than that, the tests should all pass. - Also implements a few suggestions I received offline: removing the `OP_1`s from the invalid tests (similar to 19db590), comments, and style. - A few other small fixes, like adding asserts, putting all the flags in `mapFlagNames`, better error messages ACKs for top commit: jnewbery: utACK b109bde Tree-SHA512: 7233a8c0f1ae1172fac8000ea6e05384ecf79074c39948d118464868505c7f02f17e96503c81bd05c07adb2087648a5d93d9899e16fdefa6b7efcb51319444a9
b109bde [test] check that mapFlagNames is up to date (glozow) 5d3ced7 [test] remove unnecessary OP_1s from invalid tests (glozow) 5aee73d [test] minor improvements / followups (glozow) 8a365df [test] fix bug in ExcludeIndividualFlags (glozow) 8cac292 [test] remove invalid test from tx_valid.json (glozow) Pull request description: This is a followup to bitcoin#19698. - There was a bug in the `ExcludeIndividualFlags` function which is fixed here. - Fixing this bug also showed that there is a test that's supposed to fail (already existing in tx_invalid.json) in tx_valid.json, so I removed it. Other than that, the tests should all pass. - Also implements a few suggestions I received offline: removing the `OP_1`s from the invalid tests (similar to bitcoin@19db590), comments, and style. - A few other small fixes, like adding asserts, putting all the flags in `mapFlagNames`, better error messages ACKs for top commit: jnewbery: utACK b109bde Tree-SHA512: 7233a8c0f1ae1172fac8000ea6e05384ecf79074c39948d118464868505c7f02f17e96503c81bd05c07adb2087648a5d93d9899e16fdefa6b7efcb51319444a9
This uses the first 4 commits of #15045, rebased and added some comments. The diff is quite large already and I want to make it easy to review, so I'm splitting it into 2 PRs (transaction and script). Script one is WIP, I'll link it when I open it.
Interpretation of scripts is dependent on the script verification flags passed in.
In tests, we should always apply maximal verification flags when checking that a transaction is valid; any additional flags should invalidate the transaction. A transaction should not be valid because we forgot to include a flag, and we should apply all flags by default.
We should apply minimal verification flags when asserting that a transaction is invalid; if verification flags are applied, removing any one of them should mean the transaction is valid.
New verify flags must be backwards compatible; tests should check backwards compatibility and apply the new flags by default. All
tx_invalid
tests should continue to be invalid with the exact same verify flags. Alltx_valid
tests that don't pass with new flags should explicitly indicate that the flags need to be excluded, and fail otherwise.verifyFlags
in tx_valid.json to mean excluded verification flags instead of included flags. Edit the test data accordingly.