qa: make feature_assumeutxo.py test more robust #32117

darosior · 2025-03-21T21:29:14Z

This is another robustness fix that i dropped from #31907, intending to just tweak the error in my branch which triggers the issue. As i played more with it i eventually decided to submit a proper fix instead. Since the rationale initially confused reviewers i tried to explain it in details in the commit message (reproduced here in OP) with enough context.

The feature_assumeutxo.py functional test contains a set of checks for expected errors when a snapshot can be parsed but contains incorrect values. Some of these checks are brittle as they assert an error that is raised not only because of the injected fault, but also because of the following data in the snapshot. A previous occurence of this was fixed in e5ff4e4.

In this commit, we look into another check of the same category. This one inserts an invalid coins count for the first transaction in the snapshot. It asserts that by doing so the snapshot validation logic will try to deserialize an additional coin for the first transaction, while it fact it will be deserializing the txid of the next transaction. It then asserts that this attempt to deserialize the txid of the next transaction will result in a specific error: a coin height higher than the base height (returning the string "Bad snapshot data after deserializing 1 coins.").

This is an issue as it relies on data outside of the fault injected to assert the error string. For instance the next txid could well deserialize in a coin at a height lower than the base height, or (more probably) not successfully deserialize into a coin at all!

Fix this by injecting all the data necessary to reliably assert the expected error: an invalid coins count, followed by a valid coin, followed by a valid-serialization coin with an invalid height (which in this scenario is supposed to be the next coin's txid).

The feature_assumeutxo.py functional test contains a set of checks for expected errors when a snapshot can be parsed but contains incorrect values. Some of these checks are brittle as they assert an error that is raised not only because of the injected fault, but also because of the following data in the snapshot. A previous occurence of this was fixed in e5ff4e4. In this commit, we look into another check of the same category. This one inserts an invalid coins count for the first transaction in the snapshot. It asserts that by doing so the snapshot validation logic will try to deserialize an additional coin for the first transaction, while it fact it will be deserializing the txid of the next transaction. It then asserts that this attempt to deserialize the txid of the next transaction will result in a specific error: a coin height higher than the base height (returning the string "Bad snapshot data after deserializing 1 coins."). This is an issue as it relies on data outside of the fault injected to assert the error string. For instance the next txid could well deserialize in a coin at a height lower than the base height, or (more probably) not successfully deserialize into a coin at all! Fix this by injecting all the data necessary to reliably assert the expected error: an invalid coins count, followed by a valid coin, followed by a valid-serialization coin with an invalid height (which in this scenario is supposed to be the next coin's txid).

DrahtBot · 2025-03-21T21:29:16Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/32117.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	rkrux, fjahr, TheCharlatan

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#32155 (miner: timelock the coinbase to the mined block's height by darosior)
#32096 (Move some tests and documentation from testnet3 to testnet4 by Sjors)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

rkrux

Concept ACK a7c57cf

This is much clearer compared to last time. I have reviewed this PR from the POV of someone new to the assumeutxo feature and asked a question because there appears to be two points that I have not been able to tie together yet.

rkrux · 2025-03-24T11:41:40Z

test/functional/feature_assumeutxo.py

@@ -139,7 +140,19 @@ def expected_error(msg):
        cases = [
            # (content, offset, wrong_hash, custom_message)
            [b"\xff" * 32, 0, "7d52155c9a9fdc4525b637ef6170568e5dad6fabd0b1fdbb9432010b8453095b", None],  # wrong outpoint hash
-            [(2).to_bytes(1, "little"), 32, None, "Bad snapshot data after deserializing 1 coins."],  # wrong txid coins count
+            # Enter an invalid number of coins for the first txid: 2 instead of 1. The logic will


The comment says this is an invalid number of coins here but the validation code does actually read the said number of coins. Maybe mention the reason why the number of coins is invalid at this stage?

bitcoin/src/validation.cpp

Lines 5928 to 5938 in 770d39a

try {

Txid txid;

coins_file >> txid;

size_t coins_per_txid{0};

coins_per_txid = ReadCompactSize(coins_file);

if (coins_per_txid > coins_left) {

return util::Error{Untranslated("Mismatch in coins count in snapshot metadata and actual snapshot data")};

}

for (size_t i = 0; i < coins_per_txid; i++) {

Mentioned this because the actual test failure error is caused by invalid height being passed in the serialized coin data, I'm trying to tie it up to why the number of coins is invalid.

bitcoin/src/validation.cpp

Lines 5944 to 5949 in 770d39a

if (coin.nHeight > base_height ||

outpoint.n >= std::numeric_limits<decltype(outpoint.n)>::max() // Avoid integer wrap-around in coinstats.cpp:ApplyHash

) {

return util::Error{Untranslated(strprintf("Bad snapshot data after deserializing %d coins",

coins_count - coins_left))};

}

rkrux · 2025-03-24T11:51:50Z

test/functional/feature_assumeutxo.py

+            [
+                ser_compact_size(2)  # coins count
+                + ser_compact_size(0) + ser_varint(1) + ser_varint(compress_amount(0)) + ser_varint(0) + b"\x00"*20  # first coin, height 1
+                + ser_compact_size(0) + ser_varint(300 * 2) + ser_varint(compress_amount(0)) + ser_varint(0) + b"\x00"*20,  # second coin, height 300


-ser_varint(300 * 2) +ser_varint((SNAPSHOT_BASE_HEIGHT + 1) * 2)

Will do if this gets more Concept ACKs.

fjahr · 2025-03-24T12:45:19Z

The intent is clearer to me now as well, thanks. I think the new test is valuable but I am not sure that this means that the old test has to be removed as well. These are two different tests now, the old test is testing a very simple off by one error and the new one is more elaborate. I don't really see why we have to throw away the first one.

The fear seems to be that the structure of the dump changes and that would require touching the test. I'm just not that concerned about that because these tests overall have very strong assumptions about the structure of the dump (with the offset for example) and I expect will need to touch them anyway if we make changes to the dump file. But I would be fine if the error message is loosened so that it's not checking the exact error message but instead just ensures that there is a failure without expecting that exact message.

darosior · 2025-03-25T22:36:19Z

These are two different tests now

No they are not. This just made the existing test deterministic, of sorts.

the old test is testing a very simple off by one error

To be honest i don't think the test as it stands today in master is all that useful. It claims to be testing a case where the coins count is too large. A too large coins count would "overflow" reading coins into the next coin's txid. In all likelihood this should fail to deserialize and return a "Bad snapshot format or truncated snapshot after deserializing" error. By some luck it happens to deserialize with the current snapshot data, so this actually tests the case where a coin has a too high block height. It seems the person who introduced this test just asserted the first error that arised when injecting this fault without thinking too much what this is supposed to be testing.

these tests overall have very strong assumptions about the structure of the dump (with the offset for example)

If they do then i think this should be fixed, which is what i propose in this PR. But it is not true that all those tests are similarly brittle. Let's go case by case:

bitcoin/test/functional/feature_assumeutxo.py

Lines 140 to 155 in c0b7159

    
           # (content, offset, wrong_hash, custom_message) 
        
           [b"\xff" * 32, 0, "7d52155c9a9fdc4525b637ef6170568e5dad6fabd0b1fdbb9432010b8453095b", None],  # wrong outpoint hash 
        
           [(2).to_bytes(1, "little"), 32, None, "Bad snapshot data after deserializing 1 coins."],  # wrong txid coins count 
        
           [b"\xfd\xff\xff", 32, None, "Mismatch in coins count in snapshot metadata and actual snapshot data"],  # txid coins count exceeds coins left 
        
           [b"\x01", 33, "9f4d897031ab8547665b4153317ae2fdbf0130c7840b66427ebc48b881cb80ad", None],  # wrong outpoint index 
        
           [b"\x81", 34, "3da966ba9826fb6d2604260e01607b55ba44e1a5de298606b08704bc62570ea8", None],  # wrong coin code VARINT 
        
           [b"\x80", 34, "091e893b3ccb4334378709578025356c8bcb0a623f37c7c4e493133c988648e5", None],  # another wrong coin code 
        
           [b"\x84\x58", 34, None, "Bad snapshot data after deserializing 0 coins"],  # wrong coin case with height 364 and coinbase 0 
        
           [ 
        
               # compressed txout value + scriptpubkey 
        
               ser_varint(compress_amount(MAX_MONEY + 1)) + ser_varint(0), 
        
               # txid + coins per txid + vout + coin height 
        
               32 + 1 + 1 + 2, 
        
               None, 
        
               "Bad snapshot data after deserializing 0 coins - bad tx out value" 
        
           ],  # Amount exceeds MAX_MONEY

The first one replaces the txid of the first coin with a different one. So it wouldn't break if the snapshot data changes.
The second one is the one fixed in this PR.
The third one overwrite the coins count with an insanely high value and would still run into this error if the snapshot data changes.
The third one replaces the vout of the first coin in the first transaction with 1 instead of 0. It would not break if the snapshot data were to change.
The fourth one does the same for the coin height+coinbase bit. Same it would not break.
The fifth one is equivalent to the fourth one.
The sixth one tests an invalid coin block height. It would still run into this error if the snapshot data around it were to change.
The seventh and final one is the one that was fixed in qa: clarify and document one assumeutxo test case with malleated snapshot #31907.

So besides the one that is fixed in this PR and the one that was fixed in my previous PR, none are similarly brittle.

fjahr · 2025-03-25T23:01:21Z

But it is not true that all those tests are similarly brittle.

You are looking at these tests in isolation, that's not what I was talking about. The offsets of these test cases are still just as brittle and this carries through all the tests that would follow a mismatch. For example the coins count is a VARINT so it can have a different size and if its size of that VARINT changes in the dump then all the following offsets are wrong. All I am saying is that the test still has very strong assumptions about the structure of the dump.

darosior · 2025-03-27T19:38:32Z

For what it's worth #32155 is the branch i was working on and the reason for this PR. I just wanted to avoid changing the error messages there and let reviewers figure out why it's fine (which involves learning the structure of the snapshot, familiarizing themselves with the offset, and understanding that actually the error message depends on the next txid in the snapshot and therefore may change if the txid change).

I usually find that "change the error message until the test pass" to be a big red flag in review and wanted to save time to reviewer of what i consider an important PR by doing the right thing with this preparatory PR. Given the feedback here i just went with "update the error message and let people figure it out" in #32155. If this PR gets renewed interest i am also happy to rebase #32155 on top of it after it's merged. Feel free also to NACK and i'll close, my feelings won't be hurt (too much :p).

fjahr · 2025-03-27T20:39:56Z

Thanks for giving more context @darosior . I am certainly concept ACK on adding this new test and -0 on removing the old one. If other reviewers agree that the old test should be removed I'm fine with it.

We seem to have differing views on two points (that are related): 1. How much more robust the new test is than the old one, you described you view in detail now added the context for that and I just see a lot more ways that the new test could still break. 2. What level the test should focus on. I would interpret what you write as you think it's not necessary to test failures that are caught by deserialization. I guess that's fair if we assume deserialization should be covered by unit tests instead. But my view is that we shouldn't try to be too smart about these things because it can also mean something can be missed, so more of a grey/black box approach than white box.

Maybe these test are not perfect but one of them found a pretty surprising bug so they did serve their purpose already as they are, I guess that's why I am biased to keep them around.

Another thought I had: We could, for each of these cases, mine a new chain in the test with slightly different transactions included which cover the different cases, create a new snapshot for each of them and then only swap the metadata part of these snapshots. That would guarantee that we wouldn't see any deserialization related error even if the structure (and even serialization) changes. I'm not sure that is what we should do because it would slow test down significantly nor do I think this means the new test suggested here shouldn't be added. I am just putting the thought out here to demonstrate that there is spectrum of robustness and other trade-offs here.

darosior · 2025-03-27T20:57:23Z

Ok, i'll keep this open a couple more days in case someone else has a different opinion. Otherwise i guess this can be closed.

What level the test should focus on. I would interpret what you write as you think it's not necessary to test failures that are caught by deserialization.

This is of course not what i say. Testing edge cases is good. It's just better if it's done deterministically.

Again, the test as it stands today does not check a deserialization error but a "height too high" error.

Maybe these test are not perfect but one of them found a pretty surprising bug so they did serve their purpose already as they are, I guess that's why I am biased to keep them around.

A non-deterministic test is better than no test. Perfection or nothing is not my point. However your example seems to make my point: it found an issue in the coin's height because this is not a test for a deserialization error but an insane coin height error.

I don't feel too strongly. So unless someone else does i don't think this is worth hashing further.

DrahtBot · 2025-04-01T16:00:23Z

🐙 This pull request conflicts with the target branch and needs rebase.

Sjors · 2025-04-02T08:40:20Z

Rebase is needed probably because of #32096 8cfc09f.

TheCharlatan · 2025-04-09T13:05:20Z

Concept ACK

Sjors · 2025-05-15T09:52:35Z

I'll give this PR some thought after you rebase it...

darosior · 2025-05-15T13:20:06Z

My rationale for this PR was to make #32155 easier to review. Since it is now merged, i think my time would be better spent on other changes so i'm going to close this one.

DrahtBot added the Tests label Mar 21, 2025

fanquake requested a review from fjahr March 23, 2025 01:32

rkrux reviewed Mar 24, 2025

View reviewed changes

DrahtBot mentioned this pull request Mar 25, 2025

Move some tests and documentation from testnet3 to testnet4 #32096

Merged

DrahtBot mentioned this pull request Mar 27, 2025

miner: timelock the coinbase to the mined block's height #32155

Merged

DrahtBot added the Needs rebase label Apr 1, 2025

darosior closed this May 15, 2025

	try {
	Txid txid;
	coins_file >> txid;
	size_t coins_per_txid{0};
	coins_per_txid = ReadCompactSize(coins_file);

	if (coins_per_txid > coins_left) {
	return util::Error{Untranslated("Mismatch in coins count in snapshot metadata and actual snapshot data")};
	}

	for (size_t i = 0; i < coins_per_txid; i++) {

	if (coin.nHeight > base_height \|\|
	outpoint.n >= std::numeric_limits<decltype(outpoint.n)>::max() // Avoid integer wrap-around in coinstats.cpp:ApplyHash
	) {
	return util::Error{Untranslated(strprintf("Bad snapshot data after deserializing %d coins",
	coins_count - coins_left))};
	}

qa: make feature_assumeutxo.py test more robust #32117

qa: make feature_assumeutxo.py test more robust #32117

Uh oh!

Conversation

darosior commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

rkrux left a comment

Choose a reason for hiding this comment

Uh oh!

rkrux Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

rkrux Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

darosior Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

fjahr commented Mar 24, 2025

Uh oh!

darosior commented Mar 25, 2025

Uh oh!

fjahr commented Mar 25, 2025

Uh oh!

darosior commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fjahr commented Mar 27, 2025

Uh oh!

darosior commented Mar 27, 2025

Uh oh!

DrahtBot commented Apr 1, 2025

Uh oh!

Sjors commented Apr 2, 2025

Uh oh!

TheCharlatan commented Apr 9, 2025

Uh oh!

Sjors commented May 15, 2025

Uh oh!

darosior commented May 15, 2025

Uh oh!

Uh oh!

darosior commented Mar 21, 2025 •

edited

Loading

DrahtBot commented Mar 21, 2025 •

edited

Loading

darosior commented Mar 27, 2025 •

edited

Loading