Skip to content

Conversation

achow101
Copy link
Member

@achow101 achow101 commented Aug 30, 2021

It is convenient to have a descriptor which specifies both receiving and change addresses in a single string. However, as discussed in #17190 (comment), it is not feasible to use a generic multipath specification like BIP 88 due to combinatorial blow up and that it would result in unexpected descriptors.

To resolve that problem, this PR proposes a targeted solution which allows only a single pair of 2 derivation indexes to be inserted in the place of a single derivation index. So instead of two descriptor wpkh(xpub.../0/0/*) and wpkh(xpub.../0/1/*) to represent receive and change addresses, this could be written as wpkh(xpub.../0/<0;1>/*). The multipath specifier is of the form <NUM;NUM>. Each NUM can have its own hardened specifier, e.g. <0;1h> is valid. The multipath specifier can also only appear in one path index in the derivation path.

This results in the parser returning two descriptors. The first descriptor uses the first NUM in all pairs present, and the second uses the second NUM. In our implementation, if a multipath descriptor is not provided, a pair is still returned, but the second element is just nullptr.

The wallet will not output the multipath descriptors (yet). Furthermore, when a multipath descriptor is imported, it is expanded to the two descriptors and each imported on its own, with the second descriptor being implicitly for internal (change) addresses. There is no change to how the wallet stores or outputs descriptors (yet).

Note that the path specifier is different from what was proposed. It uses angle brackets and the semicolon because these are unused characters available in the character set and I wanted to avoid conflicts with characters already in use in descriptors.

Closes #17190

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 30, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK furszy, darosior, glozow, mjdietzx, pythcoiner
Concept ACK craigraw
Stale ACK Rspigler

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30243 (Tr partial descriptors by Eunovo)
  • #29936 (fuzz: wallet: add target for CreateTransaction by brunoerg)
  • #29396 (rpc: getdescriptorinfo also returns normalized descriptor by araujo88)
  • #29136 (wallet: addhdkey RPC to add just keys to wallets via new void(KEY) descriptor by achow101)
  • #25979 ([WIP] wallet: standardize change output detection process by furszy)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@Rspigler
Copy link
Contributor

Rspigler commented Sep 1, 2021

Strong concept ACK!

Glad this is getting a fix. As discussed in 17190 and OP, there's more improvements that can come after this.

@mjdietzx
Copy link
Contributor

mjdietzx commented Sep 3, 2021

Concept and Approach ACK. I plan on updating the test and docs introduced in #22067 to take advantage of this in a follow up PR. So I will try to do a more thorough review and testing of this after that is merged and I start on the followup based on this branch. (or if it looks like this will get merged first I will do that earlier)

@Sjors
Copy link
Member

Sjors commented Sep 29, 2021

Note that the path specifier is different from what was proposed. It uses angle brackets and the semicolon because these are unused characters available in the character set and I wanted to avoid conflicts with characters already in use in descriptors.

{0,1} seems more readable and would at least be a subset of BIP 88. The conflict is with taproot descriptors, which use { and , for nesting the scripts.

As an example tr([00000000/1']xpub/{0,1}/*,{pk([00000000/2']xpub/{0,1}/*),pk([00000000/3']xpub/{0,1}/*)}) doesn't look terrible to me. How much does it really complicate the parsing code? They're always either inside () of a descriptor function like pkh() or within [] of an origin (if we even allow that).

Allowing a mix of hardened and unhardened like <0;1h> seems unnecessary to me and might make alternative implementations more complicated (though perhaps it's trivial; I haven't tried).

@achow101
Copy link
Member Author

The conflict is with taproot descriptors, which use { and , for nesting the scripts.

The conflict is actually with the , in a variety of descriptors. The particular issue I had run into was using , in a multi() descriptor. Since , is used to separate the components of multi(), the parser would look for the next , for the next item to parse. If we use , for the multipath specifier, then it would run into that and parse too little. The parser could be changed to not do that, but that was more work than to just use unused characters. { did not cause any issues, but to make it easier for other implementations which may be parsing differently, I decided to go with a different character as well.

@mjdietzx
Copy link
Contributor

I'm doing some more review/testing of this PR, but when I checkout this PR and rebase to master, I get build error:

wallet/rpcwallet.cpp:3387:45: error: no viable conversion from 'std::pair<std::unique_ptr<Descriptor>, std::unique_ptr<Descriptor> >' to 'std::unique_ptr<Descriptor>'
  std::unique_ptr<Descriptor> desc = Parse(desc_str, desc_out, error, true);

@achow101
Copy link
Member Author

Fixed the silent merge conflict.

@mjdietzx
Copy link
Contributor

So in my tests, from this multipath descriptor string: f"wsh(sortedmulti(M},{f'/<0;1>/*,'.join(xpubs)}/<0;1>/*))"

I have a descriptor that looks like this:

{
  'descriptor': 'wsh(sortedmulti(2,tpubDCcZnBitiPCBqsdLWx8Lkc2BrsvVTdZU2Gcioi8yY76HrGSvp4oWyxZ5GUKGVcjdv4wzWbfntfLdcEEfVRamhSgwi1ZxgfNtayQ2QEEjfTv/0/*,tpubDCMtV5vMS9Zn2paRcw7o83ytLY7WHMCGYRSDxhz7HWnZ4ix19EQpi7g3wvYxEvuoXRfLy5XpXJP5VEr7qPzS92TWnVrBMPJYzfupdACC6vP/0/*,tpubDDBi5pEGAmG7k8t2udh8vbYb1Zpwnt5E9bxd86z6xR66r6tYH1aXy5J3kxR2NDBWhVzgcyiAjFw3kCbkf8YcxDx4wjsnvKjKu3SfPVi4dzf/0/*))#shfp6kvs', 
  'descriptor_change': 'wsh(sortedmulti(2,tpubDCcZnBitiPCBqsdLWx8Lkc2BrsvVTdZU2Gcioi8yY76HrGSvp4oWyxZ5GUKGVcjdv4wzWbfntfLdcEEfVRamhSgwi1ZxgfNtayQ2QEEjfTv/1/*,tpubDCMtV5vMS9Zn2paRcw7o83ytLY7WHMCGYRSDxhz7HWnZ4ix19EQpi7g3wvYxEvuoXRfLy5XpXJP5VEr7qPzS92TWnVrBMPJYzfupdACC6vP/1/*,tpubDDBi5pEGAmG7k8t2udh8vbYb1Zpwnt5E9bxd86z6xR66r6tYH1aXy5J3kxR2NDBWhVzgcyiAjFw3kCbkf8YcxDx4wjsnvKjKu3SfPVi4dzf/1/*))#ghxep260', 
  'checksum': 'e82r9xv3', 
  'isrange': True, 
  'issolvable': True, 
  'hasprivatekeys': False
}

But now, multisig.getrawchangeaddress() fails with error: This wallet has no available keys (-4):

2021-10-18T22:26:50.595000Z TestFramework (INFO): Check that every participant's multisig generates the same addresses...
2021-10-18T22:26:50.602000Z TestFramework (ERROR): JSONRPC error
Traceback (most recent call last):
  File "/Users/michaeldietz/Documents/bitcoin/test/functional/test_framework/test_framework.py", line 132, in main
    self.run_test()
  File "/Users/michaeldietz/Documents/bitcoin/test/functional/wallet_multisig_descriptor_psbt.py", line 95, in run_test
    change_addresses = [multisig.getrawchangeaddress() for multisig in participants["multisigs"]]
  File "/Users/michaeldietz/Documents/bitcoin/test/functional/wallet_multisig_descriptor_psbt.py", line 95, in <listcomp>
    change_addresses = [multisig.getrawchangeaddress() for multisig in participants["multisigs"]]
  File "/Users/michaeldietz/Documents/bitcoin/test/functional/test_framework/coverage.py", line 49, in __call__
    return_val = self.auth_service_proxy_instance.__call__(*args, **kwargs)
  File "/Users/michaeldietz/Documents/bitcoin/test/functional/test_framework/authproxy.py", line 144, in __call__
    raise JSONRPCException(response['error'], status)
test_framework.authproxy.JSONRPCException: Error: This wallet has no available keys (-4)

Any ideas? I am updating #22067 to use multipath descriptors and using it as a chance to test this PR more

@achow101
Copy link
Member Author

achow101 commented Oct 18, 2021

@mjdietzx importdescriptors is probably failing. You should check the result of the import and see if there are any errors.

Is your branch pushed so I can have a look?

@mjdietzx
Copy link
Contributor

In my code I actually have:

result = multisig.importdescriptors([
                {
                    "desc": descriptor["descriptor"],
                    "active": True,
                    "timestamp": "now",
                },
            ])
            assert all(r["success"] for r in result)

And that doesn't catch any errors. I will push my branch shortly and reply w/ it

@mjdietzx
Copy link
Contributor

I had an error in my tests - it seems everything in this PR is working properly.

This works:

result = multisig.importdescriptors([
  {
    "desc": descsum_create(f"wsh(sortedmulti({self.M},{f'/<0;1>/*,'.join(xpubs)}/<0;1>/*))"),
    "active": True,
    "timestamp": "now",
  },
])

re the error I posted eariler, I was trying to do this (using getdescriptorinfo instead of descsum_create):

descriptor = multisig.getdescriptorinfo(f"wsh(sortedmulti({self.M},{f'/<0;1>/*,'.join(xpubs)}/<0;1>/*))")
result = multisig.importdescriptors([
    {
        "desc": descriptor["descriptor"],
        "active": True,
        "timestamp": "now",
    },
])

But I was not importing or doing anything with descriptor["descriptor_change"], hence the (obvious in hindsight) error.


We used getdescriptorinfo instead of descsum_create so test/functional/wallet_multisig_descriptor_psbt.py will only use node functions without an additional script for the checksum (see review of #22067 for more info). I wonder if this is useful user feedback at all @achow101? I'm specifically wondering about the interface of getdescriptorinfo now, and if it would make sense to return something like:

{
  descriptor // still the multipath descriptor, similar to output of descsum_create
  // now the broken out descriptors:
  descriptor_receiving
  descriptor_change
 // ... and the other fields are unchanged
}

@achow101
Copy link
Member Author

We used getdescriptorinfo instead of descsum_create so test/functional/wallet_multisig_descriptor_psbt.py will only use node functions without an additional script for the checksum (see review of #22067 for more info). I wonder if this is useful user feedback at all @achow101? I'm specifically wondering about the interface of getdescriptorinfo now, and if it would make sense to return something like:

The checksum field is for that purpose. You can concatenate the descriptor input, #, and that checksum.

src/rpc/misc.cpp Outdated
@@ -170,6 +170,7 @@ static RPCHelpMan getdescriptorinfo()
RPCResult::Type::OBJ, "", "",
{
{RPCResult::Type::STR, "descriptor", "The descriptor in canonical form, without private keys"},
{RPCResult::Type::STR, "descriptor_change", "The change descriptor in canonical form, without private keys. Only if the provided descriptor specifies derivation paths for both receiving and change."},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced this is the best interface for getdescriptorinfo

I'd expect descriptor to always be the full descriptor (whether multipath or not). In the case of multipath, I'd then expect two optional attributes descriptor_receiving and descriptor_change to be present in the response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: descriptor_change should be optional

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

descriptor cannot currently be the multipath descriptor as generating the string output for one is not yet implemented. I'm not sure on a good approach for doing that yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying something like:

  • The multipath descriptor string input to getdescriptorinfo is parsed/expanded to the two descriptors
  • Now each of these descriptors is converted back ToString and returned as descriptor (first) and descriptor_change (second)

And that we currently don't have a way to go from the two parsed/expanded descriptors back to the single multipath descriptor string?

That does sound tricky, just trying to make sure I understand

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so what's the downside of something like:

  • Each of these parsed descriptors is converted back ToString and returned as descriptor_receiving (first) and descriptor_change (second)
  • The same multipath descriptor string input to getdescriptorinfo is just returned for the descriptor output (with a checksum added)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@achow101 could you entertain me on this, if nothing else it'll help me understand this better as I finish up my review

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15986 provides a bit more context for this.

There are a few reasons I would prefer to not change the behavior of the descriptor result. The first is for backwards compatibility. This RPC may already be used in a way that expects the canonicalized descriptor produced by getdescriptorinfo in the descriptor field. Additionally, if a descriptor with a private key were provided, if we were to just output that same string again, then we would be echoing private keys which is something that we want to avoid doing.

Copy link
Member

@Sjors Sjors Jan 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

descriptor cannot currently be the multipath descriptor as generating the string output for one is not yet implemented. I'm not sure on a good approach for doing that yet.

Mmm, that has me a bit worried. If a user imports a multipath descriptor, I think we should remember it in that form. And if we embrace multipath descriptors, we should probably also use them for new wallets.

Perhaps a better design would be a MultipathDescriptor subclass that only has a receive and change method, or more neutral: a pair method that returns a pair of Descriptors that then work as usual.

In the future std::pair<Descriptor> can be generalized to std::map<uint32, Descriptor> to allow <NUM,NUM,...,NUM>, etc. That may even be appropriate now: we could treat <0;1> as a typical receive/change setup, and refuse to import any other combination (though that's an RPC level thing, and we might as well use std::pair<Descriptor> given such an import restriction).

Copy link
Contributor

@mjdietzx mjdietzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting very close to signing off on this PR, just a few minor things/questions

src/rpc/misc.cpp Outdated
{RPCResult::Type::STR, "address", "the derived addresses"},
}
{
RPCResult{"for single derivation descriptors",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking through the response returned here. It doesn't seem ideal that the shape of this response differs between a single descriptor and a multipath descriptor. Ie this returns an array of addresses if we pass in a descriptor, but it returns an object if we pass in a multipath descriptor.

Have you thought through an api where the response is more uniform and "feels" the same for both types? I don't necessarily know a better way to do it, wondering if you've thought this through though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary reason for having the two different return results is to maintain backwards compatibility with anyone who may be using the RPC now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I figured there'd need to be a deprecation process if we wanted to do this. Anyways, if you think it's enough of an improvement to warrant that I'd be happy to do give it a go as a followup PR. Just lmk

src/rpc/misc.cpp Outdated
obj.pushKV("change", DeriveAddresses(descs.second.get(), range_begin, range_end, key_provider));
return obj;
} else {
return addresses;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my above comment, if the response is uniform we could potentially simplify this a bit and have a single return at the end of the function

@mjdietzx
Copy link
Contributor

Review ACK 796c439. I also lightly used/tested this by running #23308 on top of this as additional test coverage.

@mjdietzx
Copy link
Contributor

I think there may be another silent merge conflict w/ master:

wallet/test/util.cpp:33:37: error: no viable conversion from 'std::pair<std::unique_ptr<Descriptor>, std::unique_ptr<Descriptor> >' to 'std::unique_ptr<Descriptor>'
        std::unique_ptr<Descriptor> desc = Parse("combo(" + EncodeSecret(key) + ")", provider, error, /* require_checksum=*/ false);

Can you try rebasing again?

achow101 added 10 commits August 8, 2024 12:47
When given a descriptor which contins a multipath derivation specifier,
a vector of descriptors will be returned.
When given a multipath descriptor, derive all of the descriptors.
The derived addresses will be returned in an object
consisting of multiple arrays. For compatibility, when given a single path
descriptor, the addresses are provided in a single array as before.
Instead of applying internal-ness to all keys being imported at the same
time, apply it on a per key basis. So each key that is imported will
carry with it whether it is for the change keypool.
Multipath descriptors will be imported as multiple separate descriptors.
When there are exactly 2 multipath items, the first descriptor will be
for receiving addreses, and the second for change
addresses. When importing a multipath descriptor, 'internal' cannot be
specified.
Multipath descriptors will be imported as multiple separate descriptors.
When there are 2 multipath items, the first descriptor will be for receiving
addresses and the second for change. This mirrors importmulti.
Test that both importmulti and importdescriptors behave as expected when
importing a multipath descriptor.
Copy link
Member

@furszy furszy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK a0abcbd

@DrahtBot DrahtBot requested a review from darosior August 8, 2024 19:42
@darosior
Copy link
Member

re-ACK a0abcbd

Running a fuzzer for a couple days on the previous push (rebased on master to get the speedups from #30197) and with a dictionary specifically targeting the multipath expressions did not uncover anything.

Copy link
Member

@glozow glozow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

light code review ACK a0abcbd

I'm unfamiliar with the descriptor code, but the refactoring commits look like pure refactoring, behavior change commits look correct, RPC changes seem intuitive, and tests seem to cover things when I introduce mutations.


multi(2,xpub.../<0;1;2>/0/*,xpub.../<2;3;4>/*)

will expand to the two descriptors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this 3?

const auto& parsed_desc = parsed_descs.at(j);
bool desc_internal = internal.has_value() && internal.value();
if (parsed_descs.size() == 2) {
desc_internal = j == 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also add a comment here on the j==1, "first is receiving, second is change" or something

@@ -1596,6 +1615,7 @@ RPCHelpMan importdescriptors()
{
return RPCHelpMan{"importdescriptors",
"\nImport descriptors. This will trigger a rescan of the blockchain based on the earliest timestamp of all descriptors being imported. Requires a new wallet backup.\n"
"When importing descriptors with multipath key expressions, if the multipath specifier contains exactly two elements, the descriptor produced from the second elements will be imported as an internal descriptor.\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit "second elements". Also could be helpful that apart from this special case, all multipath are non-internal

Comment on lines +1449 to +1451
std::optional<size_t> multipath_segment_index;
std::vector<uint32_t> multipath_values;
std::unordered_set<uint32_t> seen_multipath;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add some comments for future wallet reviewers, maybe mentioning that they should be consistent with each other (multipath_segment_index.has_value() == !multipath_values.empty() == !seen_multipath.empty())

if (!multipath_segment_index) {
out.emplace_back(std::move(path));
} else {
// Replace the multipath placeholder with each value while generating paths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can Assume(!multipath_values.empty()) here

@achow101
Copy link
Member Author

@pythcoiner @mjdietzx Can you please re-review?

@mjdietzx
Copy link
Contributor

reACK a0abcbd

@achow101 achow101 modified the milestones: 28.0, 29.0 Aug 15, 2024
@pythcoiner
Copy link

reACK a0abcbd

@glozow glozow merged commit f93d555 into bitcoin:master Aug 28, 2024
16 checks passed
Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this merge, would it be a good idea to update relevant BIPs (possibly 88, 388, 389)?

e.g. https://github.com/bitcoin/bips/blob/master/bip-0088.mediawiki#compatibility

"There is a discussion on path templating for bitcoin script descriptors at #17190, which proposes the format xpub...{0,1}/, of which the {0,1}/ part would correspond to the partial path template in the format of this BIP."

apostrophe = last == '\'';
const Span<const char>& elem = split[i];

// Check if element contain multipath specifier
Copy link
Member

@jonatack jonatack Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit s/contain/contains (if anyone updates for the other review suggestions)

out.emplace_back(std::move(path));
} else {
// Replace the multipath placeholder with each value while generating paths
for (size_t i = 0; i < multipath_values.size(); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, compilers may auto-optimize here but if anyone handles the other feedback suggestions: s/i++/++i/

hodlinator added a commit to hodlinator/bitcoin that referenced this pull request Mar 21, 2025
ryanofsky added a commit that referenced this pull request Mar 31, 2025
56f271e descriptors refactor: Clarify multipath data relationships through local struct (Hodlinator)
7e974f4 descriptors refactor: Use range-for and limit scope of seen_multipath (Hodlinator)
99a92ef descriptors doc: Correct Markdown format + wording (Hodlinator)

Pull request description:

  Follows up on unresolved suggestions from #22838. In order of priority:

  1. Fixes a couple of typos [^1][^2] and indentation to conform to Markdown.
  2. Solves `for`-loop nit [^3] and also limits variable scope.
  3. Clarifies data relationships [^4][^5] by introducing `struct` rather than comments.

  [^1]: #22838 (comment)
  [^2]: #22838 (comment)
  [^3]: #22838 (comment)
  [^4]: #22838 (comment)
  [^5]: #22838 (comment)

ACKs for top commit:
  Prabhat1308:
    re-ACK [`56f271e`](56f271e)
  mabu44:
    tACK 56f271e
  l0rinc:
    utACK 56f271e
  rkrux:
    crACK 56f271e

Tree-SHA512: 75777c911640038a3e0ea48601c0f55463a5f8ff5b3462d81e8992d9fc8f988d5a240e2166befa67a2a246696b0863f8e2508524c14697c490d3b229fe048996
@bitcoin bitcoin locked and limited conversation to collaborators Aug 28, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

descriptors: represent multiple derivation paths within one descriptor