Skip to content

Conversation

sipa
Copy link
Member

@sipa sipa commented Dec 7, 2020

The current hex tx decoding logic will refuse to decode valid extended-encoded transactions if the result fails the heuristic sanity check, even when the legacy-encoding fails. Fix this.

Fixes #20579

@sipa
Copy link
Member Author

sipa commented Dec 7, 2020

@instagibbs @ajtowns Ping, it seems this may be fixing a regression caused by #17775 (see #20579).

@sipa sipa changed the title Improve heuristic hex decoding Improve heuristic hex transaction decoding Dec 7, 2020
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK daf98fe

Could perhaps use some tests.

@sipa sipa force-pushed the 202012_fancy_tx_hex_decode branch 2 times, most recently from bf1463a to 4515568 Compare December 7, 2020 23:36
@instagibbs
Copy link
Member

instagibbs commented Dec 8, 2020 via email

@maflcko maflcko added this to the 0.21.0 milestone Dec 8, 2020
Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the example in the bug report, decoderawtransaction (without specifying witness-only decode) fails, but getrawtransaction c61adde6f1c09ef4c9864e5c5aa7d92e0c6e76e4413710cf989bae9145ef141e true 0000000000000000001051d0017518f9fb7a7d201751a34fa91621ed32c0d88f succeeds, so I think this is working as I expected. I didn't suggest a change along these lines because of the extra memory usage, but if we're okay with that, +1 from me.

Separately should we consider doing anything to remove the output that fails the sanity check from the utxo set, given it doesn't parse and hence is unspendable?

ACK 4515568

return true;
}

// If only one deserialization succeeded, use that one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following if statement also handles the case where both deserializations succeeded but neither passed sanity, which the comments skip.

Maybe

// If both deserializations succeeded, prefer the one for which CheckTxScriptsSanity returns true.
// If that's the case for both ** or neither **, prefer the extended one
// If only one deserialization succeeded, use that one.
if (ok_ext && ok_legacy)
    if (TxSanity(tx_ext) || !TxSanity(tx_legacy)) {
        return tx_ext;
    } else {
        return tx_legacy;
    }
} else if (ok_ext) {
    return tx_ext;
} else if (ok_legacy) {
   return tx_legacy;
} else {
   fail;
}

would match the comments better? Or could consolidate the conditions something like:

if (ok_ext && !(ok_legacy && !TxSanity(tx_ext) && TxSanity(tx_legacy)) {
    // If extended decode succeeded, prefer it, except if legacy also succeeded,
    // extended didn't pass sanity checks, and legacy did pass sanity checks
    return tx_ext;
} else if (ok_legacy) {
    // Otherwise, if legacy succeeded use it
    return tx_legacy;
} else {
   // Otherwise fail
   fail;
}

Alternatively you could stay closer to the current code with:

if (try_witness) {
    ok_ext = Decode(tx_ext);
    if (ok_ext && (!try_no_witness || Sanity(tx_ext))) return tx_ext;
}
if (try_no_witness) {
    ok_legacy = Decode(tx_legacy);
    if (ok_legacy && (!ok_ext || Sanity(tx_legacy)) return tx_legacy;
}
if (ok_ext) return tx_ext; // neither pass sanity checks
fail;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For code-readability I prefer the version of how the pr is currently written. The first two options are too nested to be readable.
If the goal is to skip a useless decoding in the early-return case (Sanity(tx_ext)), I prefer your third option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a number of comments, and restructured a bit along the lines of the 3rd option.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, it seems this comes with two changes. It prefers the extended encoding:

  • when neither passes the sanity check, but both deserialize. Previously it preferred the legacy one.
  • when the extended one doesn't pass sanity checks and the legacy encoding doesn't succeed. Previously it failed.

return true;
}

// If only one deserialization succeeded, use that one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment implies that only one deserialization has succeeded if this line is reached, which is not true

Suggested change
// If only one deserialization succeeded, use that one.
// If both deserializations succeeded, prefer the extended one. Otherwise use the one that succeeded deserialization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote the comments.

@maflcko
Copy link
Member

maflcko commented Dec 8, 2020

It might be possible to harvest seeds from the decode_tx fuzz test which has good coverage for this: https://drahtbot.space/host_reports/DrahtBot/reports/coverage_fuzz/sipa/bitcoin/451556858a23d81f/fuzz.coverage/src/core_read.cpp.gcov.html

Whenever both encodings are permitted, try both, and if only one succeeds,
return that one. Otherwise prefer the one for which the heuristic sanity
check passes. If that is the case for neither or for both, return the
extended-permitting deserialization.
@sipa sipa force-pushed the 202012_fancy_tx_hex_decode branch from 4515568 to 39c42c4 Compare December 8, 2020 18:12
@sipa
Copy link
Member Author

sipa commented Dec 8, 2020

Restructured the code a bit, and added comments. Also added a regression test based on the example from #20579.

@MarcoFalke I wrote a fuzzer to find transactions that decode incorrectly (see https://github.com/sipa/bitcoin/tree/202012_fancy_tx_hex_decode_fuzz). Turns out it's not very hard if you feed it a few real transactions as seeds, but they all look very weird. Using this we could probably quickly find extra conditions to add to CheckTxScriptsSanity, but I think we should keep that for a future PR.

@achow101
Copy link
Member

achow101 commented Dec 9, 2020

Code review ACK 0f949cd

@jonatack
Copy link
Member

Tested ACK 0f949cd

The new naming, comments, and code organisation are much improved.

Toyed with placing the CheckTxScriptsSanity calls right after setting each ok_{extended,legacy} but it's less readable.

diff --git a/src/core_read.cpp b/src/core_read.cpp
index 7687a86185..fe64b7b95e 100644
--- a/src/core_read.cpp
+++ b/src/core_read.cpp
@@ -148,37 +148,39 @@ static bool DecodeTx(CMutableTransaction& tx, const std::vector<unsigned char>&
         CDataStream ssData(tx_data, SER_NETWORK, PROTOCOL_VERSION);
         try {
             ssData >> tx_extended;
-            if (ssData.empty()) ok_extended = true;
+            if (ssData.empty()) {
+                ok_extended = true;
+                // Optimization: if extended decoding succeeded and the result passes CheckTxScriptsSanity,
+                // don't bother decoding the other way.
+                if (CheckTxScriptsSanity(tx_extended)) {
+                    tx = std::move(tx_extended);
+                    return true;
+                }
+            }
         } catch (const std::exception&) {
             // Fall through.
         }
     }
 
-    // Optimization: if extended decoding succeeded and the result passes CheckTxScriptsSanity,
-    // don't bother decoding the other way.
-    if (ok_extended && CheckTxScriptsSanity(tx_extended)) {
-        tx = std::move(tx_extended);
-        return true;
-    }
-
     // Try decoding with legacy serialization, and remember if the result successfully consumes the entire input.
     if (try_no_witness) {
         CDataStream ssData(tx_data, SER_NETWORK, PROTOCOL_VERSION | SERIALIZE_TRANSACTION_NO_WITNESS);
         try {
             ssData >> tx_legacy;
-            if (ssData.empty()) ok_legacy = true;
+            if (ssData.empty()) {
+                ok_legacy = true;
+                // If legacy decoding succeeded and passes CheckTxScriptsSanity, that's our answer, as we know
+                // at this point that extended decoding either failed or doesn't pass the sanity check.
+                if (CheckTxScriptsSanity(tx_legacy)) {
+                    tx = std::move(tx_legacy);
+                    return true;
+                }
+            }
         } catch (const std::exception&) {
             // Fall through.
         }
     }
 
-    // If legacy decoding succeeded and passes CheckTxScriptsSanity, that's our answer, as we know
-    // at this point that extended decoding either failed or doesn't pass the sanity check.
-    if (ok_legacy && CheckTxScriptsSanity(tx_legacy)) {
-        tx = std::move(tx_legacy);
-        return true;
-    }
-
     // If extended decoding succeeded, and neither decoding passes sanity, return the extended one.

@laanwj
Copy link
Member

laanwj commented Dec 10, 2020

Code review ACK 0f949cd

The new naming, comments, and code organisation are much improved.

Yes the comments definitely help clarify why the function is like this.

@laanwj laanwj merged commit eb53c03 into bitcoin:master Dec 10, 2020
maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Dec 10, 2020
Whenever both encodings are permitted, try both, and if only one succeeds,
return that one. Otherwise prefer the one for which the heuristic sanity
check passes. If that is the case for neither or for both, return the
extended-permitting deserialization.

Github-Pull: bitcoin#20595
Rebased-From: 39c42c4
maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Dec 10, 2020
@maflcko
Copy link
Member

maflcko commented Dec 10, 2020

Backported in #20612

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 10, 2020
0f949cd Add regression test for incorrect decoding (Pieter Wuille)
39c42c4 Improve heuristic hex transaction decoding (Pieter Wuille)

Pull request description:

  The current hex tx decoding logic will refuse to decode valid extended-encoded transactions if the result fails the heuristic sanity check, even when the legacy-encoding fails. Fix this.

  Fixes bitcoin#20579

ACKs for top commit:
  achow101:
    Code review ACK 0f949cd
  jonatack:
    Tested ACK 0f949cd
  laanwj:
    Code review ACK 0f949cd

Tree-SHA512: bd6dc80d824eb9a87026a623be910cac92173f8ce1c8b040c2246348c3cf0c6d64bcc40127b859e5e4da1efe88cf02a6945f7ebb91079799395145cb09d9c7a5
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

invalid transaction decoding
8 participants