Skip to content

Conversation

furszy
Copy link
Member

@furszy furszy commented Dec 2, 2024

The internal script-to-descriptor inference procedure shouldn't return invalid
descriptors or ones that fail the string-to-descriptor parsing process. These
two procedures should always support seamless roundtrips.

This inlines InferScript/InferDescriptor to the actual descriptors
ParseScript logic for the multisig path, ensuring only valid descriptors are
produced.

Examples where this presents an issue:

  1. Because the legacy wallet allowed the import of invalid scripts, the process
    could end up inferring and storing an unparsable descriptor, which crashes
    the software during any subsequent wallet load.

    Note:
    This issue will no longer exist after #31378, because we will discard multisig
    scripts that are consensus-invalid or descriptors-incompatible prior to calling
    the descriptors inference procedure but fixing this also at the descriptors level
    prevents future misuses.

  2. The decodescript() RPC command currently return unusable multisig descriptors.
    For example, providing a P2SH multisig redeem script with more than 520 bytes results
    in a descriptor when it shouldn't due to its large size.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 2, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31404.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #31378 (wallet: fix crash during migration due to invalid multisig descriptors by furszy)
  • #28710 (Remove the legacy wallet and BDB dependency by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@furszy furszy force-pushed the 2024_descriptors_infer_multisig branch 2 times, most recently from 4e27172 to 50ebd73 Compare December 2, 2024 16:23
@furszy furszy changed the title [RFC] descriptors: inference process, do not return unparsable multisig descriptors descriptors: inference process, do not return unparsable multisig descriptors Dec 4, 2024
Copy link
Contributor

@rkrux rkrux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make, functional tests successful.

Comment on lines +104 to +105
# Verify it is valid only under segwit rules, not for bare multisig nor p2sh.
assert 'desc' not in rpc_result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_equal('multisig', rpc_result['type'])

Nit: This assertion can also be added. I modified the test case to have 21 keys, and the type changed to nonstandard, which is correct. p2sh and desc values are present in that output.

Do you think it's worth adding the test case with 21 pubkeys as well to cover the nonstandard scenario?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert_equal('multisig', rpc_result['type'])

Nit: This assertion can also be added. I modified the test case to have 21 keys, and the type changed to nonstandard, which is correct. p2sh and desc values are present in that output.

Do you think it's worth adding the test case with 21 pubkeys as well to cover the nonstandard scenario?

Hmm, I find the type return field misleading. Based on the code, it relates to the output script type, which in this case can only be represented as p2wsh and not as a bare multisig. This is because (1) bare multisig scripts with +3 pubkeys are non-standard, and (2) bare multisig scripts with +15 pubkeys are consensus-invalid.

So, I'm not entirely convinced about the type assertion at this point. It seems we should improve the type field RPC description, correct the RPC to actually return "non-standard" for bare multisigs with +3 pubkeys, and perhaps introduce the "consensus-invalid" (or simply "invalid") type too.

@@ -87,7 +87,7 @@ bool IsStandard(const CScript& scriptPubKey, const std::optional<unsigned>& max_
unsigned char m = vSolutions.front()[0];
unsigned char n = vSolutions.back()[0];
// Support up to x-of-3 multisig txns as standard
Copy link
Contributor

@Eunovo Eunovo Dec 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

088bbb8: Should probably replace // Support up to x-of-3 multisig txns as standard with
// Support up to x-of-MAX_BARE_MULTISIG_PUBKEYS_NUM multisig txns as standard

The internal script-to-descriptor inference procedure shouldn't return invalid
descriptors or ones that fail the string-to-descriptor parsing process. These
two procedures should always support seamless roundtrips.

This inlines `InferScript`/`InferDescriptor` to the actual descriptors
`ParseScript` logic for the multisig path, ensuring only valid descriptors are
produced.

Examples where this presents an issue:
1) Because the legacy wallet allowed the import of invalid scripts, the process
   could end up inferring and storing an unparsable descriptor, which crashes
   the software during any subsequent wallet load

2) The `decodescript()` RPC command currently return unusable multisig descriptors.
   For example, providing a P2SH multisig redeem script with more than 520 bytes
   results in a descriptor when it shouldn't due to its large size.
@furszy furszy force-pushed the 2024_descriptors_infer_multisig branch from 50ebd73 to e1d6179 Compare December 28, 2024 14:46
}

// Verify we don't exceed consensus limits
if (ctx == ParseScriptContext::P2SH && script.size() > MAX_SCRIPT_ELEMENT_SIZE) {
Copy link
Contributor

@Eunovo Eunovo Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e1d6179: IIUC trying to infer a multisig redeem script that is too large under a p2sh context will return addr(...)#... instead of nullptr. Is this what you were going for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC trying to infer a multisig redeem script that is too large under a p2sh context will return addr(...)#... instead of nullptr. Is this what you were going for?

This diff ensures that nullptr is returned under p2sh context (if that's what you were going for) and adds a test case that covers this if block.

diff --git a/src/script/descriptor.cpp b/src/script/descriptor.cpp
index a2eb764706..7bc2f8cc88 100644
--- a/src/script/descriptor.cpp
+++ b/src/script/descriptor.cpp
@@ -2259,6 +2259,7 @@ std::unique_ptr<DescriptorImpl> InferScript(const CScript& script, ParseScriptCo
         if (provider.GetCScript(scriptid, subscript)) {
             auto sub = InferScript(subscript, ParseScriptContext::P2SH, provider);
             if (sub) return std::make_unique<SHDescriptor>(std::move(sub));
+            return nullptr;
         }
     }
     if (txntype == TxoutType::WITNESS_V0_SCRIPTHASH && (ctx == ParseScriptContext::TOP || ctx == ParseScriptContext::P2SH)) {
diff --git a/src/test/descriptor_tests.cpp b/src/test/descriptor_tests.cpp
index 24b8f2f793..4c0c0bc273 100644
--- a/src/test/descriptor_tests.cpp
+++ b/src/test/descriptor_tests.cpp
@@ -431,17 +431,19 @@ void CheckMultipath(const std::string& prv,
     }
 }
 
-void CheckInferDescriptor(const std::string& script_hex, const std::string& expected_desc, const std::vector<std::string>& hex_scripts, const std::vector<std::pair<std::string, std::string>>& origin_pubkeys)
+// Helper function to set up a FlatSigningProvider with scripts and public keys for descriptor inference testing
+FlatSigningProvider CreateInferSigningProvider(const std::vector<std::string>& hex_scripts, const std::vector<std::pair<std::string, std::string>>& origin_pubkeys)
 {
-    std::vector<unsigned char> script_bytes{ParseHex(script_hex)};
-    const CScript& script{script_bytes.begin(), script_bytes.end()};
-
     FlatSigningProvider provider;
+    
+    // Add scripts to provider
     for (const std::string& prov_script_hex : hex_scripts) {
         std::vector<unsigned char> prov_script_bytes{ParseHex(prov_script_hex)};
         const CScript& prov_script{prov_script_bytes.begin(), prov_script_bytes.end()};
         provider.scripts.emplace(CScriptID(prov_script), prov_script);
     }
+
+    // Add public keys and their origins to provider
     for (const auto& [pubkey_hex, origin_str] : origin_pubkeys) {
         CPubKey origin_pubkey{ParseHex(pubkey_hex)};
         provider.pubkeys.emplace(origin_pubkey.GetID(), origin_pubkey);
@@ -472,12 +474,32 @@ void CheckInferDescriptor(const std::string& script_hex, const std::string& expe
         }
     }
 
+    return provider;
+}
+
+void CheckInferDescriptor(const std::string& script_hex, const std::string& expected_desc, const std::vector<std::string>& hex_scripts, const std::vector<std::pair<std::string, std::string>>& origin_pubkeys)
+{
+    std::vector<unsigned char> script_bytes{ParseHex(script_hex)};
+    const CScript& script{script_bytes.begin(), script_bytes.end()};
+
+    FlatSigningProvider provider = CreateInferSigningProvider(hex_scripts, origin_pubkeys);
     std::string checksum{GetDescriptorChecksum(expected_desc)};
 
     std::unique_ptr<Descriptor> desc = InferDescriptor(script, provider);
     BOOST_CHECK_EQUAL(desc->ToString(), expected_desc + "#" + checksum);
 }
 
+void CheckNoInferDescriptor(const std::string& script_hex, const std::vector<std::string>& hex_scripts, const std::vector<std::pair<std::string, std::string>>& origin_pubkeys)
+{
+    std::vector<unsigned char> script_bytes{ParseHex(script_hex)};
+    const CScript& script{script_bytes.begin(), script_bytes.end()};
+
+    FlatSigningProvider provider = CreateInferSigningProvider(hex_scripts, origin_pubkeys);
+
+    std::unique_ptr<Descriptor> desc = InferDescriptor(script, provider);
+    BOOST_CHECK(!desc);
+}
+
 }
 
 BOOST_FIXTURE_TEST_SUITE(descriptor_tests, BasicTestingSetup)
@@ -988,6 +1010,9 @@ BOOST_AUTO_TEST_CASE(descriptor_test)
     CheckInferDescriptor("76a914a31725c74421fadc50d35520ab8751ed120af80588ac", "pkh(04c56fe4a92d401bcbf1b3dfbe4ac3dac5602ca155a3681497f02c1b9a733b92d704e2da6ec4162e4846af9236ef4171069ac8b7f8234a8405b6cadd96f34f5a31)", {}, {{"04c56fe4a92d401bcbf1b3dfbe4ac3dac5602ca155a3681497f02c1b9a733b92d704e2da6ec4162e4846af9236ef4171069ac8b7f8234a8405b6cadd96f34f5a31", ""}});
     // Infer pk() from p2pk with uncompressed key
     CheckInferDescriptor("4104032540df1d3c7070a8ab3a9cdd304dfc7fd1e6541369c53c4c3310b2537d91059afc8b8e7673eb812a32978dabb78c40f2e423f7757dca61d11838c7aeeb5220ac", "pk(04032540df1d3c7070a8ab3a9cdd304dfc7fd1e6541369c53c4c3310b2537d91059afc8b8e7673eb812a32978dabb78c40f2e423f7757dca61d11838c7aeeb5220)", {}, {{"04032540df1d3c7070a8ab3a9cdd304dfc7fd1e6541369c53c4c3310b2537d91059afc8b8e7673eb812a32978dabb78c40f2e423f7757dca61d11838c7aeeb5220", ""}});
+
+    // Infer sh(very_large_multisig) from p2sh fails 
+    CheckNoInferDescriptor("a914b0d5c60fe453cd72cd4942db2cfe01b2d084b0ee87", {"01142103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb20114ae"}, {});
 }
 
 BOOST_AUTO_TEST_SUITE_END()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a more delicate topic. InferScript can return nullptr in valid scenarios too. It shouldn't just fail in every case as you're doing there.

For example, if the script is a P2SH output and the provider lacks the redeem script (watch-only scenario), the expected behavior is to return a addr(). But with your changes, it would return `nullptr instead.

We need a different way to signal errors from inner InferScript() calls. Could be a boolean or.. a different return object like a util::Result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if the script is a P2SH output and the provider lacks the redeem script (watch-only scenario), the expected behavior is to return a addr(). But with your changes, it would return `nullptr instead.

Are you certain? I checked this and I couldn't verify this claim. If the redeem script is not available in the provider, provider.GetCScript(scriptid, subscript) should evaluate to false and the expected addr() desc will be returned. I tested this by adding the following test case for the example you gave CheckInferDescriptor("a914b0d5c60fe453cd72cd4942db2cfe01b2d084b0ee87", "addr(3Hp2wZXSKK7Zw2hJEE2pU2Qcs6pPt9t5cG)", {}, {}); and it passed.

Have I missed something?

@@ -1105,7 +1105,7 @@ static RPCHelpMan gettxout()
{RPCResult::Type::STR_AMOUNT, "value", "The transaction value in " + CURRENCY_UNIT},
{RPCResult::Type::OBJ, "scriptPubKey", "", {
{RPCResult::Type::STR, "asm", "Disassembly of the output script"},
{RPCResult::Type::STR, "desc", "Inferred descriptor for the output"},
{RPCResult::Type::STR, "desc", /*optional=*/true, "Inferred top level descriptor for the script (if any)"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than making the RPC output field optional, can we not fallback to always returning a raw / addr descriptor if inferring returns something that would be invalid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than making the RPC output field optional, can we not fallback to always returning a raw / addr descriptor if inferring returns something that would be invalid?

Hmm, I would prefer to avoid returning addr() descriptors when the script cannot be redeemed due to consensus rules. Mainly because it opens up the remote possibility of users sending money to it (since the address is there).

This isn't an issue in this gettxout() case because the script comes from an unspent output in the UTXO or mempool, meaning the worst case scenario is a non-standard script. However, it could be problematic in other places that uses InferDescriptor() like createmultisig() (if we ever feed the internal InferDescriptor call with invalid data) or the current migration process (which will no longer be the case after #31495).

We can return raw / addr for non-standard scripts. This would let us assert that the descriptor always exist at gettxout().

Copy link
Contributor

@brunoerg brunoerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems InferDescriptor can still produce invalid descriptors due to invalid keys. e.g.:
Hex script: 5141046161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161610051ae
descriptor (invalid): multi(1,0461616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616100)#jqqp7208.

@maflcko
Copy link
Member

maflcko commented Feb 28, 2025

[17:25:46.471]                                    test_framework.authproxy.JSONRPCException: Internal bug detected: RPC call "decoderawtransaction" returned incorrect type:
[17:25:46.471]                                    {
[17:25:46.471]                                        "vout": {
[17:25:46.471]                                            "1": {
[17:25:46.471]                                                "scriptPubKey": {
[17:25:46.471]                                                    "desc": "key missing, despite not being optional in doc"
[17:25:46.471]                                                }
[17:25:46.471]                                            }
[17:25:46.471]                                        }
[17:25:46.471]                                    }
[17:25:46.471]                                    Bitcoin Core v28.99.0-6723b1c36115-dirty
[17:25:46.471]                                    Please report this issue here: https://github.com/bitcoin/bitcoin/issues
[17:25:46.471]                                     (-1)

@maflcko
Copy link
Member

maflcko commented Mar 12, 2025

Could turn into draft while CI is failing and feedback is waiting to be addressed?

@furszy furszy marked this pull request as draft March 12, 2025 13:56
@DrahtBot
Copy link
Contributor

DrahtBot commented May 7, 2025

🐙 This pull request conflicts with the target branch and needs rebase.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 4, 2025

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants