-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Improve heuristic hex transaction decoding #20595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@instagibbs @ajtowns Ping, it seems this may be fixing a regression caused by #17775 (see #20579). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK daf98fe
Could perhaps use some tests.
bf1463a
to
4515568
Compare
Can you grab the example in the issue(or generate one) as a regression test?
…On Tue, Dec 8, 2020, 7:37 AM Pieter Wuille ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/core_read.cpp
<#20595 (comment)>:
> } catch (const std::exception&) {
// Fall through.
}
}
+ // Try decoding with legacy serialization, and remember if the result succesfully consumes the entire input.
Fixed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20595 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMAFU63TLZS43YZI27WVZ3STVRM7ANCNFSM4URDWYYQ>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the example in the bug report, decoderawtransaction
(without specifying witness-only decode) fails, but getrawtransaction c61adde6f1c09ef4c9864e5c5aa7d92e0c6e76e4413710cf989bae9145ef141e true 0000000000000000001051d0017518f9fb7a7d201751a34fa91621ed32c0d88f
succeeds, so I think this is working as I expected. I didn't suggest a change along these lines because of the extra memory usage, but if we're okay with that, +1 from me.
Separately should we consider doing anything to remove the output that fails the sanity check from the utxo set, given it doesn't parse and hence is unspendable?
ACK 4515568
src/core_read.cpp
Outdated
return true; | ||
} | ||
|
||
// If only one deserialization succeeded, use that one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following if
statement also handles the case where both deserializations succeeded but neither passed sanity, which the comments skip.
Maybe
// If both deserializations succeeded, prefer the one for which CheckTxScriptsSanity returns true.
// If that's the case for both ** or neither **, prefer the extended one
// If only one deserialization succeeded, use that one.
if (ok_ext && ok_legacy)
if (TxSanity(tx_ext) || !TxSanity(tx_legacy)) {
return tx_ext;
} else {
return tx_legacy;
}
} else if (ok_ext) {
return tx_ext;
} else if (ok_legacy) {
return tx_legacy;
} else {
fail;
}
would match the comments better? Or could consolidate the conditions something like:
if (ok_ext && !(ok_legacy && !TxSanity(tx_ext) && TxSanity(tx_legacy)) {
// If extended decode succeeded, prefer it, except if legacy also succeeded,
// extended didn't pass sanity checks, and legacy did pass sanity checks
return tx_ext;
} else if (ok_legacy) {
// Otherwise, if legacy succeeded use it
return tx_legacy;
} else {
// Otherwise fail
fail;
}
Alternatively you could stay closer to the current code with:
if (try_witness) {
ok_ext = Decode(tx_ext);
if (ok_ext && (!try_no_witness || Sanity(tx_ext))) return tx_ext;
}
if (try_no_witness) {
ok_legacy = Decode(tx_legacy);
if (ok_legacy && (!ok_ext || Sanity(tx_legacy)) return tx_legacy;
}
if (ok_ext) return tx_ext; // neither pass sanity checks
fail;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For code-readability I prefer the version of how the pr is currently written. The first two options are too nested to be readable.
If the goal is to skip a useless decoding in the early-return case (Sanity(tx_ext)
), I prefer your third option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a number of comments, and restructured a bit along the lines of the 3rd option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, it seems this comes with two changes. It prefers the extended encoding:
- when neither passes the sanity check, but both deserialize. Previously it preferred the legacy one.
- when the extended one doesn't pass sanity checks and the legacy encoding doesn't succeed. Previously it failed.
src/core_read.cpp
Outdated
return true; | ||
} | ||
|
||
// If only one deserialization succeeded, use that one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment implies that only one deserialization has succeeded if this line is reached, which is not true
// If only one deserialization succeeded, use that one. | |
// If both deserializations succeeded, prefer the extended one. Otherwise use the one that succeeded deserialization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rewrote the comments.
It might be possible to harvest seeds from the decode_tx fuzz test which has good coverage for this: https://drahtbot.space/host_reports/DrahtBot/reports/coverage_fuzz/sipa/bitcoin/451556858a23d81f/fuzz.coverage/src/core_read.cpp.gcov.html |
Whenever both encodings are permitted, try both, and if only one succeeds, return that one. Otherwise prefer the one for which the heuristic sanity check passes. If that is the case for neither or for both, return the extended-permitting deserialization.
4515568
to
39c42c4
Compare
Restructured the code a bit, and added comments. Also added a regression test based on the example from #20579. @MarcoFalke I wrote a fuzzer to find transactions that decode incorrectly (see https://github.com/sipa/bitcoin/tree/202012_fancy_tx_hex_decode_fuzz). Turns out it's not very hard if you feed it a few real transactions as seeds, but they all look very weird. Using this we could probably quickly find extra conditions to add to CheckTxScriptsSanity, but I think we should keep that for a future PR. |
Code review ACK 0f949cd |
Tested ACK 0f949cd The new naming, comments, and code organisation are much improved. Toyed with placing the |
Code review ACK 0f949cd
Yes the comments definitely help clarify why the function is like this. |
Whenever both encodings are permitted, try both, and if only one succeeds, return that one. Otherwise prefer the one for which the heuristic sanity check passes. If that is the case for neither or for both, return the extended-permitting deserialization. Github-Pull: bitcoin#20595 Rebased-From: 39c42c4
Github-Pull: bitcoin#20595 Rebased-From: 0f949cd
Backported in #20612 |
0f949cd Add regression test for incorrect decoding (Pieter Wuille) 39c42c4 Improve heuristic hex transaction decoding (Pieter Wuille) Pull request description: The current hex tx decoding logic will refuse to decode valid extended-encoded transactions if the result fails the heuristic sanity check, even when the legacy-encoding fails. Fix this. Fixes bitcoin#20579 ACKs for top commit: achow101: Code review ACK 0f949cd jonatack: Tested ACK 0f949cd laanwj: Code review ACK 0f949cd Tree-SHA512: bd6dc80d824eb9a87026a623be910cac92173f8ce1c8b040c2246348c3cf0c6d64bcc40127b859e5e4da1efe88cf02a6945f7ebb91079799395145cb09d9c7a5
The current hex tx decoding logic will refuse to decode valid extended-encoded transactions if the result fails the heuristic sanity check, even when the legacy-encoding fails. Fix this.
Fixes #20579