descriptors: inference process, do not return unparsable multisig descriptors #31404

furszy · 2024-12-02T16:04:40Z

The internal script-to-descriptor inference procedure shouldn't return invalid
descriptors or ones that fail the string-to-descriptor parsing process. These
two procedures should always support seamless roundtrips.

This inlines InferScript/InferDescriptor to the actual descriptors
ParseScript logic for the multisig path, ensuring only valid descriptors are
produced.

Examples where this presents an issue:

Because the legacy wallet allowed the import of invalid scripts, the process
could end up inferring and storing an unparsable descriptor, which crashes
the software during any subsequent wallet load.

Note:
This issue will no longer exist after #31378, because we will discard multisig
scripts that are consensus-invalid or descriptors-incompatible prior to calling
the descriptors inference procedure but fixing this also at the descriptors level
prevents future misuses.
The decodescript() RPC command currently return unusable multisig descriptors.
For example, providing a P2SH multisig redeem script with more than 520 bytes results
in a descriptor when it shouldn't due to its large size.

DrahtBot · 2024-12-02T16:04:45Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31404.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31378 (wallet: fix crash during migration due to invalid multisig descriptors by furszy)
#28710 (Remove the legacy wallet and BDB dependency by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

test/functional/rpc_decodescript.py

src/script/descriptor.cpp

…EYS_NUM'

rkrux

make, functional tests successful.

rkrux · 2024-12-24T11:54:04Z

test/functional/rpc_decodescript.py

+        # Verify it is valid only under segwit rules, not for bare multisig nor p2sh.
+        assert 'desc' not in rpc_result


assert_equal('multisig', rpc_result['type'])

Nit: This assertion can also be added. I modified the test case to have 21 keys, and the type changed to nonstandard, which is correct. p2sh and desc values are present in that output.

Do you think it's worth adding the test case with 21 pubkeys as well to cover the nonstandard scenario?

assert_equal('multisig', rpc_result['type'])

Nit: This assertion can also be added. I modified the test case to have 21 keys, and the type changed to nonstandard, which is correct. p2sh and desc values are present in that output.

Do you think it's worth adding the test case with 21 pubkeys as well to cover the nonstandard scenario?

Hmm, I find the type return field misleading. Based on the code, it relates to the output script type, which in this case can only be represented as p2wsh and not as a bare multisig. This is because (1) bare multisig scripts with +3 pubkeys are non-standard, and (2) bare multisig scripts with +15 pubkeys are consensus-invalid.

So, I'm not entirely convinced about the type assertion at this point. It seems we should improve the type field RPC description, correct the RPC to actually return "non-standard" for bare multisigs with +3 pubkeys, and perhaps introduce the "consensus-invalid" (or simply "invalid") type too.

src/rpc/rawtransaction.cpp

Eunovo · 2024-12-28T11:21:26Z

src/policy/policy.cpp

@@ -87,7 +87,7 @@ bool IsStandard(const CScript& scriptPubKey, const std::optional<unsigned>& max_
        unsigned char m = vSolutions.front()[0];
        unsigned char n = vSolutions.back()[0];
        // Support up to x-of-3 multisig txns as standard


088bbb8: Should probably replace // Support up to x-of-3 multisig txns as standard with
// Support up to x-of-MAX_BARE_MULTISIG_PUBKEYS_NUM multisig txns as standard

src/script/descriptor.cpp

src/wallet/rpc/addresses.cpp

src/core_write.cpp

The internal script-to-descriptor inference procedure shouldn't return invalid descriptors or ones that fail the string-to-descriptor parsing process. These two procedures should always support seamless roundtrips. This inlines `InferScript`/`InferDescriptor` to the actual descriptors `ParseScript` logic for the multisig path, ensuring only valid descriptors are produced. Examples where this presents an issue: 1) Because the legacy wallet allowed the import of invalid scripts, the process could end up inferring and storing an unparsable descriptor, which crashes the software during any subsequent wallet load 2) The `decodescript()` RPC command currently return unusable multisig descriptors. For example, providing a P2SH multisig redeem script with more than 520 bytes results in a descriptor when it shouldn't due to its large size.

Eunovo · 2025-01-05T16:42:59Z

src/script/descriptor.cpp

+        }
+
+        // Verify we don't exceed consensus limits
+        if (ctx == ParseScriptContext::P2SH && script.size() > MAX_SCRIPT_ELEMENT_SIZE) {


e1d6179: IIUC trying to infer a multisig redeem script that is too large under a p2sh context will return addr(...)#... instead of nullptr. Is this what you were going for?

IIUC trying to infer a multisig redeem script that is too large under a p2sh context will return addr(...)#... instead of nullptr. Is this what you were going for?

This diff ensures that nullptr is returned under p2sh context (if that's what you were going for) and adds a test case that covers this if block.

diff --git a/src/script/descriptor.cpp b/src/script/descriptor.cpp index a2eb764706..7bc2f8cc88 100644 --- a/src/script/descriptor.cpp +++ b/src/script/descriptor.cpp @@ -2259,6 +2259,7 @@ std::unique_ptr<DescriptorImpl> InferScript(const CScript& script, ParseScriptCo if (provider.GetCScript(scriptid, subscript)) { auto sub = InferScript(subscript, ParseScriptContext::P2SH, provider); if (sub) return std::make_unique<SHDescriptor>(std::move(sub)); + return nullptr; } } if (txntype == TxoutType::WITNESS_V0_SCRIPTHASH && (ctx == ParseScriptContext::TOP || ctx == ParseScriptContext::P2SH)) { diff --git a/src/test/descriptor_tests.cpp b/src/test/descriptor_tests.cpp index 24b8f2f793..4c0c0bc273 100644 --- a/src/test/descriptor_tests.cpp +++ b/src/test/descriptor_tests.cpp @@ -431,17 +431,19 @@ void CheckMultipath(const std::string& prv, } } -void CheckInferDescriptor(const std::string& script_hex, const std::string& expected_desc, const std::vector<std::string>& hex_scripts, const std::vector<std::pair<std::string, std::string>>& origin_pubkeys) +// Helper function to set up a FlatSigningProvider with scripts and public keys for descriptor inference testing +FlatSigningProvider CreateInferSigningProvider(const std::vector<std::string>& hex_scripts, const std::vector<std::pair<std::string, std::string>>& origin_pubkeys) { - std::vector<unsigned char> script_bytes{ParseHex(script_hex)}; - const CScript& script{script_bytes.begin(), script_bytes.end()}; - FlatSigningProvider provider; + + // Add scripts to provider for (const std::string& prov_script_hex : hex_scripts) { std::vector<unsigned char> prov_script_bytes{ParseHex(prov_script_hex)}; const CScript& prov_script{prov_script_bytes.begin(), prov_script_bytes.end()}; provider.scripts.emplace(CScriptID(prov_script), prov_script); } + + // Add public keys and their origins to provider for (const auto& [pubkey_hex, origin_str] : origin_pubkeys) { CPubKey origin_pubkey{ParseHex(pubkey_hex)}; provider.pubkeys.emplace(origin_pubkey.GetID(), origin_pubkey); @@ -472,12 +474,32 @@ void CheckInferDescriptor(const std::string& script_hex, const std::string& expe } } + return provider; +} + +void CheckInferDescriptor(const std::string& script_hex, const std::string& expected_desc, const std::vector<std::string>& hex_scripts, const std::vector<std::pair<std::string, std::string>>& origin_pubkeys) +{ + std::vector<unsigned char> script_bytes{ParseHex(script_hex)}; + const CScript& script{script_bytes.begin(), script_bytes.end()}; + + FlatSigningProvider provider = CreateInferSigningProvider(hex_scripts, origin_pubkeys); std::string checksum{GetDescriptorChecksum(expected_desc)}; std::unique_ptr<Descriptor> desc = InferDescriptor(script, provider); BOOST_CHECK_EQUAL(desc->ToString(), expected_desc + "#" + checksum); } +void CheckNoInferDescriptor(const std::string& script_hex, const std::vector<std::string>& hex_scripts, const std::vector<std::pair<std::string, std::string>>& origin_pubkeys) +{ + std::vector<unsigned char> script_bytes{ParseHex(script_hex)}; + const CScript& script{script_bytes.begin(), script_bytes.end()}; + + FlatSigningProvider provider = CreateInferSigningProvider(hex_scripts, origin_pubkeys); + + std::unique_ptr<Descriptor> desc = InferDescriptor(script, provider); + BOOST_CHECK(!desc); +} + } BOOST_FIXTURE_TEST_SUITE(descriptor_tests, BasicTestingSetup) @@ -988,6 +1010,9 @@ BOOST_AUTO_TEST_CASE(descriptor_test) CheckInferDescriptor("76a914a31725c74421fadc50d35520ab8751ed120af80588ac", "pkh(04c56fe4a92d401bcbf1b3dfbe4ac3dac5602ca155a3681497f02c1b9a733b92d704e2da6ec4162e4846af9236ef4171069ac8b7f8234a8405b6cadd96f34f5a31)", {}, {{"04c56fe4a92d401bcbf1b3dfbe4ac3dac5602ca155a3681497f02c1b9a733b92d704e2da6ec4162e4846af9236ef4171069ac8b7f8234a8405b6cadd96f34f5a31", ""}}); // Infer pk() from p2pk with uncompressed key CheckInferDescriptor("4104032540df1d3c7070a8ab3a9cdd304dfc7fd1e6541369c53c4c3310b2537d91059afc8b8e7673eb812a32978dabb78c40f2e423f7757dca61d11838c7aeeb5220ac", "pk(04032540df1d3c7070a8ab3a9cdd304dfc7fd1e6541369c53c4c3310b2537d91059afc8b8e7673eb812a32978dabb78c40f2e423f7757dca61d11838c7aeeb5220)", {}, {{"04032540df1d3c7070a8ab3a9cdd304dfc7fd1e6541369c53c4c3310b2537d91059afc8b8e7673eb812a32978dabb78c40f2e423f7757dca61d11838c7aeeb5220", ""}}); + + // Infer sh(very_large_multisig) from p2sh fails + CheckNoInferDescriptor("a914b0d5c60fe453cd72cd4942db2cfe01b2d084b0ee87", {"01142103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb22103b0da749730dc9b4b1f4a14d6902877a92541f5368778853d9c4a0cb7802dcfb20114ae"}, {}); } BOOST_AUTO_TEST_SUITE_END()

That's a more delicate topic. InferScript can return nullptr in valid scenarios too. It shouldn't just fail in every case as you're doing there.

For example, if the script is a P2SH output and the provider lacks the redeem script (watch-only scenario), the expected behavior is to return a addr(). But with your changes, it would return `nullptr instead.

We need a different way to signal errors from inner InferScript() calls. Could be a boolean or.. a different return object like a util::Result.

For example, if the script is a P2SH output and the provider lacks the redeem script (watch-only scenario), the expected behavior is to return a addr(). But with your changes, it would return `nullptr instead.

Are you certain? I checked this and I couldn't verify this claim. If the redeem script is not available in the provider, provider.GetCScript(scriptid, subscript) should evaluate to false and the expected addr() desc will be returned. I tested this by adding the following test case for the example you gave CheckInferDescriptor("a914b0d5c60fe453cd72cd4942db2cfe01b2d084b0ee87", "addr(3Hp2wZXSKK7Zw2hJEE2pU2Qcs6pPt9t5cG)", {}, {}); and it passed.

Have I missed something?

sipa · 2025-02-11T15:44:34Z

src/rpc/blockchain.cpp

@@ -1105,7 +1105,7 @@ static RPCHelpMan gettxout()
                {RPCResult::Type::STR_AMOUNT, "value", "The transaction value in " + CURRENCY_UNIT},
                {RPCResult::Type::OBJ, "scriptPubKey", "", {
                    {RPCResult::Type::STR, "asm", "Disassembly of the output script"},
-                    {RPCResult::Type::STR, "desc", "Inferred descriptor for the output"},
+                    {RPCResult::Type::STR, "desc", /*optional=*/true, "Inferred top level descriptor for the script (if any)"},


Rather than making the RPC output field optional, can we not fallback to always returning a raw / addr descriptor if inferring returns something that would be invalid?

Rather than making the RPC output field optional, can we not fallback to always returning a raw / addr descriptor if inferring returns something that would be invalid?

Hmm, I would prefer to avoid returning addr() descriptors when the script cannot be redeemed due to consensus rules. Mainly because it opens up the remote possibility of users sending money to it (since the address is there).

This isn't an issue in this gettxout() case because the script comes from an unspent output in the UTXO or mempool, meaning the worst case scenario is a non-standard script. However, it could be problematic in other places that uses InferDescriptor() like createmultisig() (if we ever feed the internal InferDescriptor call with invalid data) or the current migration process (which will no longer be the case after #31495).

We can return raw / addr for non-standard scripts. This would let us assert that the descriptor always exist at gettxout().

brunoerg

It seems InferDescriptor can still produce invalid descriptors due to invalid keys. e.g.:
Hex script: 5141046161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161610051ae
descriptor (invalid): multi(1,0461616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616161616100)#jqqp7208.

maflcko · 2025-02-28T11:00:29Z

[17:25:46.471]                                    test_framework.authproxy.JSONRPCException: Internal bug detected: RPC call "decoderawtransaction" returned incorrect type:
[17:25:46.471]                                    {
[17:25:46.471]                                        "vout": {
[17:25:46.471]                                            "1": {
[17:25:46.471]                                                "scriptPubKey": {
[17:25:46.471]                                                    "desc": "key missing, despite not being optional in doc"
[17:25:46.471]                                                }
[17:25:46.471]                                            }
[17:25:46.471]                                        }
[17:25:46.471]                                    }
[17:25:46.471]                                    Bitcoin Core v28.99.0-6723b1c36115-dirty
[17:25:46.471]                                    Please report this issue here: https://github.com/bitcoin/bitcoin/issues
[17:25:46.471]                                     (-1)

maflcko · 2025-03-12T13:19:05Z

Could turn into draft while CI is failing and feedback is waiting to be addressed?

DrahtBot · 2025-05-07T14:27:39Z

🐙 This pull request conflicts with the target branch and needs rebase.

DrahtBot · 2025-08-04T08:20:35Z

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

sipa reviewed Dec 2, 2024

View reviewed changes

test/functional/rpc_decodescript.py Outdated Show resolved Hide resolved

src/script/descriptor.cpp Outdated Show resolved Hide resolved

refactor: replace hardcoded number, introduce 'MAX_BARE_MULTISIG_PUBK…

088bbb8

…EYS_NUM'

furszy force-pushed the 2024_descriptors_infer_multisig branch 2 times, most recently from 4e27172 to 50ebd73 Compare December 2, 2024 16:23

furszy mentioned this pull request Dec 3, 2024

wallet: fix crash during migration due to invalid multisig descriptors #31378

Draft

furszy changed the title ~~[RFC] descriptors: inference process, do not return unparsable multisig descriptors~~ descriptors: inference process, do not return unparsable multisig descriptors Dec 4, 2024

DrahtBot mentioned this pull request Dec 11, 2024

Remove the legacy wallet and BDB dependency #28710

Merged

rkrux reviewed Dec 24, 2024

View reviewed changes

src/rpc/rawtransaction.cpp Show resolved Hide resolved

Eunovo reviewed Dec 28, 2024

View reviewed changes

src/script/descriptor.cpp Outdated Show resolved Hide resolved

Eunovo reviewed Dec 28, 2024

View reviewed changes

src/wallet/rpc/addresses.cpp Outdated Show resolved Hide resolved

Eunovo reviewed Dec 28, 2024

View reviewed changes

src/core_write.cpp Show resolved Hide resolved

furszy force-pushed the 2024_descriptors_infer_multisig branch from 50ebd73 to e1d6179 Compare December 28, 2024 14:46

Eunovo reviewed Jan 5, 2025

View reviewed changes

sipa reviewed Feb 11, 2025

View reviewed changes

brunoerg reviewed Feb 12, 2025

View reviewed changes

DrahtBot added the CI failed label Feb 27, 2025

furszy marked this pull request as draft March 12, 2025 13:56

DrahtBot added the Needs rebase label May 7, 2025

		# Verify it is valid only under segwit rules, not for bare multisig nor p2sh.
		assert 'desc' not in rpc_result

descriptors: inference process, do not return unparsable multisig descriptors #31404

Are you sure you want to change the base?

descriptors: inference process, do not return unparsable multisig descriptors #31404

Uh oh!

Conversation

furszy commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

Uh oh!

Uh oh!

rkrux left a comment

Choose a reason for hiding this comment

Uh oh!

rkrux Dec 24, 2024

Choose a reason for hiding this comment

Uh oh!

furszy Dec 24, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Eunovo Dec 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Eunovo Jan 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Eunovo Jan 5, 2025

Choose a reason for hiding this comment

Uh oh!

furszy Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

Eunovo Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

sipa Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

furszy Feb 11, 2025

Choose a reason for hiding this comment

Uh oh!

brunoerg left a comment

Choose a reason for hiding this comment

Uh oh!

maflcko commented Feb 28, 2025

Uh oh!

maflcko commented Mar 12, 2025

Uh oh!

DrahtBot commented May 7, 2025

Uh oh!

DrahtBot commented Aug 4, 2025

Uh oh!

Uh oh!

furszy commented Dec 2, 2024 •

edited

Loading

DrahtBot commented Dec 2, 2024 •

edited

Loading

Eunovo Dec 28, 2024 •

edited

Loading

Eunovo Jan 5, 2025 •

edited

Loading