Skip to content

Conversation

darosior
Copy link
Member

@darosior darosior commented Jul 4, 2022

As reported in #24860 (comment), the current code to construct a miniscript::Node could cause a blowup on large fuzzer inputs. This is because:

  1. The duplicate key check is redundantly done at parsing time, since we will recursively create miniscript nodes and the constructor will unconditionally look for duplicate across this node's keys and all its sub-nodes'.
  2. We don't put an upper bound on the size of the inputs to consider for parsing.

To avoid wasteful computation, and prevent the blowup on some fuzzer inputs, limit the size of reasonable inputs and only perform the check for duplicate keys once when parsing.
Regarding the duplicate key check bypass in the constructor we iterated on different approaches, and eventually settled on passing a dummy argument. Albeit less elegant, all other approaches required getting rid of std::make_shared and adding an allocation per node created.

This PR contains code from Pieter Wuille (see commits).

Fixes #25824.

@darosior
Copy link
Member Author

darosior commented Jul 4, 2022

ACK 23db26f -- this change was entirely authored by Pieter Wuille (i only patched CheckDuplicateKey and added a trivial unit test). I reviewed and tested it by rebasing #24148 and #24149 on top of it, checking the runtime of the fuzz targets and adapting the targets creating "random" nodes.

As a mean of comparison here are the runtime of the miniscript_string and miniscript_script targets on my machine
with the corpus from qa-assets at commit 1e3c7cd69fc06fda49a5286b98069ee5b0d64edc:

  • Before: Done 1046 runs in 10 second(s) and Done 2908 runs in 13 second(s)
  • After: Done 1046 runs in 0 second(s) and Done 2908 runs in 6 second(s)

This also made the two targets from #24149 twice as fast on average.

@maflcko maflcko added this to the 24.0 milestone Jul 4, 2022
@DrahtBot DrahtBot added the Tests label Jul 4, 2022
@maflcko maflcko removed the Tests label Jul 4, 2022
@maflcko maflcko changed the title Permit delaying duplicate key check in miniscript::Node construction miniscript: Permit delaying duplicate key check in miniscript::Node construction Jul 4, 2022
@DrahtBot DrahtBot added the Tests label Jul 4, 2022
@maflcko maflcko removed the Tests label Jul 4, 2022
@darosior
Copy link
Member Author

darosior commented Jul 4, 2022

I think we should also have a limit on the maximum number of nodes we want to tolerate when parsing. For instance we would happily parse at this moment a string so long that i can't upload it on 0bin, and run out of memory. This can be the reason for fuzz crashes too.

EDIT: added a commit for this, and changed the OP.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 4, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #24149 (Signing support for Miniscript Descriptors by darosior)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@darosior darosior force-pushed the sipa_202206_fastminiscriptdup branch from 26956eb to 40e3d43 Compare July 5, 2022 08:55
@darosior darosior changed the title miniscript: Permit delaying duplicate key check in miniscript::Node construction miniscript: avoid wasteful computation, prevent memory blowup when fuzzing Jul 5, 2022
@darosior darosior force-pushed the sipa_202206_fastminiscriptdup branch from 40e3d43 to 088e5b0 Compare July 15, 2022 08:41
@darosior
Copy link
Member Author

Rebased after #24148 was merged.

// The two integers are used to hold state for thresh()
std::vector<std::tuple<ParseContext, int64_t, int64_t>> to_parse;
std::vector<NodeRef<Key>> constructed;

to_parse.emplace_back(ParseContext::WRAPPED_EXPR, -1, -1);

while (!to_parse.empty()) {
if (opened_frags > MAX_STANDARD_P2WSH_SCRIPT_SIZE) return {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this limit is specific to Miniscript inside P2WSH. That's the case now for our descriptor implementation. Maybe use MAX_SCRIPT_SIZE (10,000) instead?

See also Resource Limitations under https://bitcoin.sipa.be/miniscript/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Miniscript code already is P2WSH specifc. We can change this once we implement Tapscript support in Miniscript.

//! by all constructors except the NoDupCheck ones. The NoDupCheck ones skip the
//! computation, requiring it to be done manually by invoking DuplicateKeyCheck().
//! DuplicateKeyCheck(), or a non-NoDupCheck constructor, will compute has_duplicate_keys
//! for all subnodes as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is useful because it allows performing other checks (like size limit) before checking for duplicates? Is so, maybe say that in the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could technically be used for this but it's not the intended purpose: we only ever use it to skip redundant computation.

Copy link
Member

@achow101 achow101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that inside of Parse, there are 3 MakeNodeRef calls which are doing the duplicate keys check. Is that intentional, and if so, why? These are for parsing AND_N, ANDOR, and THRESH.

@darosior darosior force-pushed the sipa_202206_fastminiscriptdup branch from 088e5b0 to 9f92de3 Compare August 11, 2022 09:14
@darosior
Copy link
Member Author

i didn't author this commit, but i don't think it was intentional. I see no reason for duplicating the checks there. I pushed a version removing them.

(Sorry, i had forgotten to reply to your question.)

@achow101
Copy link
Member

ACK 9f92de3

@glozow
Copy link
Member

glozow commented Sep 8, 2022

Pinging @sipa @sanket1729 if you wouldn't mind taking a look? Or perhaps @stickies-v, @jb55, @Sjors since you reviewed #24147?

@@ -1030,29 +1030,39 @@ inline NodeRef<Key> Parse(Span<const char> in, const Ctx& ctx)
{
using namespace spanparsing;

// Account for the number of fragments we started to parse. Any fragment is at least one byte
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not true for v: and and_v. It probably doesn't matter, but it makes the analysis harder.

@sipa
Copy link
Member

sipa commented Sep 12, 2022

@darosior See https://github.com/sipa/bitcoin/commits/202209_miniscript_parsesize for a commit that instead of opened fragments computes the actual script size redundantly during string parsing. That's definitely overkill, but it has the advantage of being very sensitively testable. What do you think?

@darosior
Copy link
Member Author

@sipa your commit doesn't limit the number of v: and and_v we allow, so we could still crash/timeout on parsing a large and_v(and_v(and_v(....... string.

I had incorrectly thought about the and_v and v: in my version by double counting some ops.

What do you think of a simple arbitrary limit on the number of characters in the string instead? Something like 100_000?

@sipa
Copy link
Member

sipa commented Sep 14, 2022

@darosior Good call!

I think there may be another strategy too based on:

  • every "leaf" (pk_k, pk_h, hashes, older, after) is a nonzero number of script bytes
  • every combinator with more than 1 argument increases the total number of leaves by at least one, so will increase the total script size by at least 1, even and_v.
  • every combinator with 1 argument also increases the script size by at least one, except v:, but v: cannot be recursed (vv: is never valid), so if it happens in a sequence, it has to be interleaved with other things which do add script size.

So I think a strategy could be similar to the script size counting I'm doing in my current commit, but instead "borrow" 1 byte from each leaf, and add it to the combinator that added that leaf to the count.

@darosior
Copy link
Member Author

I thought about something along these lines as well. By taking a lower bound of 0.5 bytes per fragment (because of v:), we could have accounted for half a byte per combinator and nothing for leaf fragments.
But this becomes an issue when the number of leaf fragments we parse inside a combinator isn't bounded, like in thresh.

So we are left with having to conditionally account for leaf fragments, as you mention with "borrowing". I'm not sure the complexity is worth it, but we could do this on top of my current version. I've pushed a fixup commit implementing this on top of my branch here. Let me know if i'm missing something or if you think your version accurately accounting for fragments' sizes is still preferable (since it's applicable to it as well).

@sipa
Copy link
Member

sipa commented Sep 14, 2022

@darosior darosior force-pushed the sipa_202206_fastminiscriptdup branch from 37f98e9 to 14a964e Compare September 15, 2022 08:17
@sipa
Copy link
Member

sipa commented Sep 15, 2022

@darosior Not sure you saw my commit above, WDYT? It has the property that every fragment increases the script_size variable now, with the exception of v: (which needs to be interleaved with others), and 0 and 1 (but those are terminal, so other combinators which do increment the variable are needed to even make a space to add a 0 or 1).

Its comments are outdated, though. I'll update those if desired.

EDIT: squashed and updated comments

@darosior darosior force-pushed the sipa_202206_fastminiscriptdup branch from 14a964e to e8cc2e4 Compare September 17, 2022 13:13
@darosior
Copy link
Member Author

I picked @sipa's latest commit and squashed mine with it. Although it replicates all the script size calculation, his version indeed gives better assurance that the check is in fact correct. Thanks!

I ran this branch against the descriptor_parse fuzz corpus augmented with the seeds from bitcoin-core/qa-assets#92. I don't hit a timeout anymore, neither did it hit the new assertion that the parsing-time calculation of the script size is correct.

ACK e8cc2e4 -- it's my own PR but most of the code here was written by sipa. I've reviewed and tested it.

@fanquake fanquake requested a review from achow101 September 18, 2022 08:46
@sipa
Copy link
Member

sipa commented Sep 19, 2022

ACK e8cc2e4 (for the few parts of the code that aren't mine)

@glozow glozow merged commit 55e1deb into bitcoin:master Sep 19, 2022
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 20, 2022
…memory blowup when fuzzing

e8cc2e4 Make miniscript string parsing account for exact script size as bound (Pieter Wuille)
4cb8f9a Permit delaying duplicate key check in miniscript::Node construction (Pieter Wuille)

Pull request description:

  As reported in bitcoin#24860 (comment), the current code to construct a `miniscript::Node` could cause a blowup on large fuzzer inputs. This is because:
  1. The duplicate key check is redundantly done at parsing time, since we will recursively create miniscript nodes and the constructor will unconditionally look for duplicate across this node's keys and all its sub-nodes'.
  2. We don't put an upper bound on the size of the inputs to consider for parsing.

  To avoid wasteful computation, and prevent the blowup on some fuzzer inputs, limit the size of reasonable inputs and only perform the check for duplicate keys once when parsing.
  Regarding the duplicate key check bypass in the constructor we iterated on different approaches, and eventually settled on passing a dummy argument. Albeit less elegant, all other approaches required getting rid of `std::make_shared` and adding an allocation *per node created*.

  This PR contains code from Pieter Wuille (see commits).

  Fixes bitcoin#25824.

ACKs for top commit:
  darosior:
    ACK e8cc2e4 -- it's my own PR but most of the code here was written by sipa. I've reviewed and tested it.
  sipa:
    ACK e8cc2e4 (for the few parts of the code that aren't mine)

Tree-SHA512: c21de39b3eeb484393758629882fcf8694a9bd1b8f15ae22efcec1582efc9c2309c5a0c2d90f361dd8e233d704a07dcd5fb982f4a48a002c4d8789e1d78bb526
achow101 added a commit that referenced this pull request Sep 21, 2022
648f695 Correct sanity-checking script_size calculation (Pieter Wuille)

Pull request description:

  Fix a bug in the script_size sanity-check in the miniscript string parser, found by oss-fuzz in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=51636, and introduced in e8cc2e4 (#25540).

  This bug would cause an assertion failure when feeding a miniscript with a `thresh(k,...)` fragment, with k >= 128, to an RPC.

ACKs for top commit:
  darosior:
    utACK 648f695
  achow101:
    ACK 648f695

Tree-SHA512: d86a0721758cd1e42ef02050b542f0935efdc19447a1ca76a3ade96352a6ee8261eef3d4a5cbdec77bf0ad14dfed42e9eb6bd4246b816a9f6f06d786900da9e7
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Sep 23, 2022
648f695 Correct sanity-checking script_size calculation (Pieter Wuille)

Pull request description:

  Fix a bug in the script_size sanity-check in the miniscript string parser, found by oss-fuzz in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=51636, and introduced in e8cc2e4 (bitcoin#25540).

  This bug would cause an assertion failure when feeding a miniscript with a `thresh(k,...)` fragment, with k >= 128, to an RPC.

ACKs for top commit:
  darosior:
    utACK 648f695
  achow101:
    ACK 648f695

Tree-SHA512: d86a0721758cd1e42ef02050b542f0935efdc19447a1ca76a3ade96352a6ee8261eef3d4a5cbdec77bf0ad14dfed42e9eb6bd4246b816a9f6f06d786900da9e7
@darosior darosior deleted the sipa_202206_fastminiscriptdup branch February 26, 2023 12:49
@darosior darosior mentioned this pull request Nov 9, 2023
@bitcoin bitcoin locked and limited conversation to collaborators Feb 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UndefinedBehaviorSanitizer: stack-overflow in miniscript (descriptor_parse)
7 participants