-
Notifications
You must be signed in to change notification settings - Fork 37.7k
qa: make feature_assumeutxo.py test more robust #32117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qa: make feature_assumeutxo.py test more robust #32117
Conversation
The feature_assumeutxo.py functional test contains a set of checks for expected errors when a snapshot can be parsed but contains incorrect values. Some of these checks are brittle as they assert an error that is raised not only because of the injected fault, but also because of the following data in the snapshot. A previous occurence of this was fixed in e5ff4e4. In this commit, we look into another check of the same category. This one inserts an invalid coins count for the first transaction in the snapshot. It asserts that by doing so the snapshot validation logic will try to deserialize an additional coin for the first transaction, while it fact it will be deserializing the txid of the next transaction. It then asserts that this attempt to deserialize the txid of the next transaction will result in a specific error: a coin height higher than the base height (returning the string "Bad snapshot data after deserializing 1 coins."). This is an issue as it relies on data outside of the fault injected to assert the error string. For instance the next txid could well deserialize in a coin at a height lower than the base height, or (more probably) not successfully deserialize into a coin at all! Fix this by injecting all the data necessary to reliably assert the expected error: an invalid coins count, followed by a valid coin, followed by a valid-serialization coin with an invalid height (which in this scenario is supposed to be the next coin's txid).
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32117. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK a7c57cf
This is much clearer compared to last time. I have reviewed this PR from the POV of someone new to the assumeutxo
feature and asked a question because there appears to be two points that I have not been able to tie together yet.
@@ -139,7 +140,19 @@ def expected_error(msg): | |||
cases = [ | |||
# (content, offset, wrong_hash, custom_message) | |||
[b"\xff" * 32, 0, "7d52155c9a9fdc4525b637ef6170568e5dad6fabd0b1fdbb9432010b8453095b", None], # wrong outpoint hash | |||
[(2).to_bytes(1, "little"), 32, None, "Bad snapshot data after deserializing 1 coins."], # wrong txid coins count | |||
# Enter an invalid number of coins for the first txid: 2 instead of 1. The logic will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says this is an invalid number of coins here but the validation code does actually read the said number of coins. Maybe mention the reason why the number of coins is invalid at this stage?
Lines 5928 to 5938 in 770d39a
try { | |
Txid txid; | |
coins_file >> txid; | |
size_t coins_per_txid{0}; | |
coins_per_txid = ReadCompactSize(coins_file); | |
if (coins_per_txid > coins_left) { | |
return util::Error{Untranslated("Mismatch in coins count in snapshot metadata and actual snapshot data")}; | |
} | |
for (size_t i = 0; i < coins_per_txid; i++) { |
Mentioned this because the actual test failure error is caused by invalid height being passed in the serialized coin data, I'm trying to tie it up to why the number of coins is invalid.
Lines 5944 to 5949 in 770d39a
if (coin.nHeight > base_height || | |
outpoint.n >= std::numeric_limits<decltype(outpoint.n)>::max() // Avoid integer wrap-around in coinstats.cpp:ApplyHash | |
) { | |
return util::Error{Untranslated(strprintf("Bad snapshot data after deserializing %d coins", | |
coins_count - coins_left))}; | |
} |
[ | ||
ser_compact_size(2) # coins count | ||
+ ser_compact_size(0) + ser_varint(1) + ser_varint(compress_amount(0)) + ser_varint(0) + b"\x00"*20 # first coin, height 1 | ||
+ ser_compact_size(0) + ser_varint(300 * 2) + ser_varint(compress_amount(0)) + ser_varint(0) + b"\x00"*20, # second coin, height 300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-ser_varint(300 * 2)
+ser_varint((SNAPSHOT_BASE_HEIGHT + 1) * 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do if this gets more Concept ACKs.
The intent is clearer to me now as well, thanks. I think the new test is valuable but I am not sure that this means that the old test has to be removed as well. These are two different tests now, the old test is testing a very simple off by one error and the new one is more elaborate. I don't really see why we have to throw away the first one. The fear seems to be that the structure of the dump changes and that would require touching the test. I'm just not that concerned about that because these tests overall have very strong assumptions about the structure of the dump (with the offset for example) and I expect will need to touch them anyway if we make changes to the dump file. But I would be fine if the error message is loosened so that it's not checking the exact error message but instead just ensures that there is a failure without expecting that exact message. |
No they are not. This just made the existing test deterministic, of sorts.
To be honest i don't think the test as it stands today in
If they do then i think this should be fixed, which is what i propose in this PR. But it is not true that all those tests are similarly brittle. Let's go case by case: bitcoin/test/functional/feature_assumeutxo.py Lines 140 to 155 in c0b7159
So besides the one that is fixed in this PR and the one that was fixed in my previous PR, none are similarly brittle. |
You are looking at these tests in isolation, that's not what I was talking about. The offsets of these test cases are still just as brittle and this carries through all the tests that would follow a mismatch. For example the coins count is a |
For what it's worth #32155 is the branch i was working on and the reason for this PR. I just wanted to avoid changing the error messages there and let reviewers figure out why it's fine (which involves learning the structure of the snapshot, familiarizing themselves with the offset, and understanding that actually the error message depends on the next txid in the snapshot and therefore may change if the txid change). I usually find that "change the error message until the test pass" to be a big red flag in review and wanted to save time to reviewer of what i consider an important PR by doing the right thing with this preparatory PR. Given the feedback here i just went with "update the error message and let people figure it out" in #32155. If this PR gets renewed interest i am also happy to rebase #32155 on top of it after it's merged. Feel free also to NACK and i'll close, my feelings won't be hurt (too much :p). |
Thanks for giving more context @darosior . I am certainly concept ACK on adding this new test and -0 on removing the old one. If other reviewers agree that the old test should be removed I'm fine with it. We seem to have differing views on two points (that are related): 1. How much more robust the new test is than the old one, you described you view in detail now added the context for that and I just see a lot more ways that the new test could still break. 2. What level the test should focus on. I would interpret what you write as you think it's not necessary to test failures that are caught by deserialization. I guess that's fair if we assume deserialization should be covered by unit tests instead. But my view is that we shouldn't try to be too smart about these things because it can also mean something can be missed, so more of a grey/black box approach than white box. Maybe these test are not perfect but one of them found a pretty surprising bug so they did serve their purpose already as they are, I guess that's why I am biased to keep them around. Another thought I had: We could, for each of these cases, mine a new chain in the test with slightly different transactions included which cover the different cases, create a new snapshot for each of them and then only swap the metadata part of these snapshots. That would guarantee that we wouldn't see any deserialization related error even if the structure (and even serialization) changes. I'm not sure that is what we should do because it would slow test down significantly nor do I think this means the new test suggested here shouldn't be added. I am just putting the thought out here to demonstrate that there is spectrum of robustness and other trade-offs here. |
Ok, i'll keep this open a couple more days in case someone else has a different opinion. Otherwise i guess this can be closed.
This is of course not what i say. Testing edge cases is good. It's just better if it's done deterministically. Again, the test as it stands today does not check a deserialization error but a "height too high" error.
A non-deterministic test is better than no test. Perfection or nothing is not my point. However your example seems to make my point: it found an issue in the coin's height because this is not a test for a deserialization error but an insane coin height error. I don't feel too strongly. So unless someone else does i don't think this is worth hashing further. |
🐙 This pull request conflicts with the target branch and needs rebase. |
Concept ACK |
I'll give this PR some thought after you rebase it... |
My rationale for this PR was to make #32155 easier to review. Since it is now merged, i think my time would be better spent on other changes so i'm going to close this one. |
This is another robustness fix that i dropped from #31907, intending to just tweak the error in my branch which triggers the issue. As i played more with it i eventually decided to submit a proper fix instead. Since the rationale initially confused reviewers i tried to explain it in details in the commit message (reproduced here in OP) with enough context.
The feature_assumeutxo.py functional test contains a set of checks for expected errors when a snapshot can be parsed but contains incorrect values. Some of these checks are brittle as they assert an error that is raised not only because of the injected fault, but also because of the following data in the snapshot. A previous occurence of this was fixed in e5ff4e4.
In this commit, we look into another check of the same category. This one inserts an invalid coins count for the first transaction in the snapshot. It asserts that by doing so the snapshot validation logic will try to deserialize an additional coin for the first transaction, while it fact it will be deserializing the txid of the next transaction. It then asserts that this attempt to deserialize the txid of the next transaction will result in a specific error: a coin height higher than the base height (returning the string "Bad snapshot data after deserializing 1 coins.").
This is an issue as it relies on data outside of the fault injected to assert the error string. For instance the next txid could well deserialize in a coin at a height lower than the base height, or (more probably) not successfully deserialize into a coin at all!
Fix this by injecting all the data necessary to reliably assert the expected error: an invalid coins count, followed by a valid coin, followed by a valid-serialization coin with an invalid height (which in this scenario is supposed to be the next coin's txid).