miniscript: avoid wasteful computation, prevent memory blowup when fuzzing #25540

darosior · 2022-07-04T13:19:15Z

As reported in #24860 (comment), the current code to construct a miniscript::Node could cause a blowup on large fuzzer inputs. This is because:

The duplicate key check is redundantly done at parsing time, since we will recursively create miniscript nodes and the constructor will unconditionally look for duplicate across this node's keys and all its sub-nodes'.
We don't put an upper bound on the size of the inputs to consider for parsing.

To avoid wasteful computation, and prevent the blowup on some fuzzer inputs, limit the size of reasonable inputs and only perform the check for duplicate keys once when parsing.
Regarding the duplicate key check bypass in the constructor we iterated on different approaches, and eventually settled on passing a dummy argument. Albeit less elegant, all other approaches required getting rid of std::make_shared and adding an allocation per node created.

This PR contains code from Pieter Wuille (see commits).

Fixes #25824.

darosior · 2022-07-04T13:22:07Z

ACK 23db26f -- this change was entirely authored by Pieter Wuille (i only patched CheckDuplicateKey and added a trivial unit test). I reviewed and tested it by rebasing #24148 and #24149 on top of it, checking the runtime of the fuzz targets and adapting the targets creating "random" nodes.

As a mean of comparison here are the runtime of the miniscript_string and miniscript_script targets on my machine
with the corpus from qa-assets at commit 1e3c7cd69fc06fda49a5286b98069ee5b0d64edc:

Before: Done 1046 runs in 10 second(s) and Done 2908 runs in 13 second(s)
After: Done 1046 runs in 0 second(s) and Done 2908 runs in 6 second(s)

This also made the two targets from #24149 twice as fast on average.

darosior · 2022-07-04T17:26:07Z

I think we should also have a limit on the maximum number of nodes we want to tolerate when parsing. For instance we would happily parse at this moment a string so long that i can't upload it on 0bin, and run out of memory. This can be the reason for fuzz crashes too.

EDIT: added a commit for this, and changed the OP.

DrahtBot · 2022-07-04T17:27:13Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#24149 (Signing support for Miniscript Descriptors by darosior)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

darosior · 2022-07-15T08:48:15Z

Rebased after #24148 was merged.

Sjors · 2022-07-16T13:15:08Z

src/script/miniscript.h

    // The two integers are used to hold state for thresh()
    std::vector<std::tuple<ParseContext, int64_t, int64_t>> to_parse;
    std::vector<NodeRef<Key>> constructed;

    to_parse.emplace_back(ParseContext::WRAPPED_EXPR, -1, -1);

    while (!to_parse.empty()) {
+        if (opened_frags > MAX_STANDARD_P2WSH_SCRIPT_SIZE) return {};


IIUC this limit is specific to Miniscript inside P2WSH. That's the case now for our descriptor implementation. Maybe use MAX_SCRIPT_SIZE (10,000) instead?

See also Resource Limitations under https://bitcoin.sipa.be/miniscript/

The Miniscript code already is P2WSH specifc. We can change this once we implement Tapscript support in Miniscript.

Sjors · 2022-07-16T13:18:25Z

src/script/miniscript.h

+    //! by all constructors except the NoDupCheck ones. The NoDupCheck ones skip the
+    //! computation, requiring it to be done manually by invoking DuplicateKeyCheck().
+    //! DuplicateKeyCheck(), or a non-NoDupCheck constructor, will compute has_duplicate_keys
+    //! for all subnodes as well.


I'm assuming this is useful because it allows performing other checks (like size limit) before checking for duplicates? Is so, maybe say that in the comment.

It could technically be used for this but it's not the intended purpose: we only ever use it to skip redundant computation.

achow101

I noticed that inside of Parse, there are 3 MakeNodeRef calls which are doing the duplicate keys check. Is that intentional, and if so, why? These are for parsing AND_N, ANDOR, and THRESH.

src/script/miniscript.h

darosior · 2022-08-11T09:40:57Z

i didn't author this commit, but i don't think it was intentional. I see no reason for duplicating the checks there. I pushed a version removing them.

(Sorry, i had forgotten to reply to your question.)

achow101 · 2022-08-15T19:51:38Z

ACK 9f92de3

glozow · 2022-09-08T15:29:41Z

Pinging @sipa @sanket1729 if you wouldn't mind taking a look? Or perhaps @stickies-v, @jb55, @Sjors since you reviewed #24147?

sipa · 2022-09-12T15:20:13Z

src/script/miniscript.h

@@ -1030,29 +1030,39 @@ inline NodeRef<Key> Parse(Span<const char> in, const Ctx& ctx)
 {
    using namespace spanparsing;

+    // Account for the number of fragments we started to parse. Any fragment is at least one byte


That's not true for v: and and_v. It probably doesn't matter, but it makes the analysis harder.

sipa · 2022-09-12T18:39:06Z

@darosior See https://github.com/sipa/bitcoin/commits/202209_miniscript_parsesize for a commit that instead of opened fragments computes the actual script size redundantly during string parsing. That's definitely overkill, but it has the advantage of being very sensitively testable. What do you think?

darosior · 2022-09-14T13:51:15Z

@sipa your commit doesn't limit the number of v: and and_v we allow, so we could still crash/timeout on parsing a large and_v(and_v(and_v(....... string.

I had incorrectly thought about the and_v and v: in my version by double counting some ops.

What do you think of a simple arbitrary limit on the number of characters in the string instead? Something like 100_000?

sipa · 2022-09-14T14:11:59Z

@darosior Good call!

I think there may be another strategy too based on:

every "leaf" (pk_k, pk_h, hashes, older, after) is a nonzero number of script bytes
every combinator with more than 1 argument increases the total number of leaves by at least one, so will increase the total script size by at least 1, even and_v.
every combinator with 1 argument also increases the script size by at least one, except v:, but v: cannot be recursed (vv: is never valid), so if it happens in a sequence, it has to be interleaved with other things which do add script size.

So I think a strategy could be similar to the script size counting I'm doing in my current commit, but instead "borrow" 1 byte from each leaf, and add it to the combinator that added that leaf to the count.

darosior · 2022-09-14T14:58:12Z

I thought about something along these lines as well. By taking a lower bound of 0.5 bytes per fragment (because of v:), we could have accounted for half a byte per combinator and nothing for leaf fragments.
But this becomes an issue when the number of leaf fragments we parse inside a combinator isn't bounded, like in thresh.

So we are left with having to conditionally account for leaf fragments, as you mention with "borrowing". I'm not sure the complexity is worth it, but we could do this on top of my current version. I've pushed a fixup commit implementing this on top of my branch here. Let me know if i'm missing something or if you think your version accurately accounting for fragments' sizes is still preferable (since it's applicable to it as well).

sipa · 2022-09-14T16:09:15Z

@darosior See added commit in https://github.com/sipa/bitcoin/commits/202209_miniscript_parsesize

sipa · 2022-09-15T18:25:30Z

@darosior Not sure you saw my commit above, WDYT? It has the property that every fragment increases the script_size variable now, with the exception of v: (which needs to be interleaved with others), and 0 and 1 (but those are terminal, so other combinators which do increment the variable are needed to even make a space to add a 0 or 1).

~~Its comments are outdated, though. I'll update those if desired.~~

EDIT: squashed and updated comments

Co-Authored-by: Antoine Poinsot <darosior@protonmail.com>

darosior · 2022-09-17T13:20:14Z

I picked @sipa's latest commit and squashed mine with it. Although it replicates all the script size calculation, his version indeed gives better assurance that the check is in fact correct. Thanks!

I ran this branch against the descriptor_parse fuzz corpus augmented with the seeds from bitcoin-core/qa-assets#92. I don't hit a timeout anymore, neither did it hit the new assertion that the parsing-time calculation of the script size is correct.

ACK e8cc2e4 -- it's my own PR but most of the code here was written by sipa. I've reviewed and tested it.

sipa · 2022-09-19T14:27:52Z

ACK e8cc2e4 (for the few parts of the code that aren't mine)

…memory blowup when fuzzing e8cc2e4 Make miniscript string parsing account for exact script size as bound (Pieter Wuille) 4cb8f9a Permit delaying duplicate key check in miniscript::Node construction (Pieter Wuille) Pull request description: As reported in bitcoin#24860 (comment), the current code to construct a `miniscript::Node` could cause a blowup on large fuzzer inputs. This is because: 1. The duplicate key check is redundantly done at parsing time, since we will recursively create miniscript nodes and the constructor will unconditionally look for duplicate across this node's keys and all its sub-nodes'. 2. We don't put an upper bound on the size of the inputs to consider for parsing. To avoid wasteful computation, and prevent the blowup on some fuzzer inputs, limit the size of reasonable inputs and only perform the check for duplicate keys once when parsing. Regarding the duplicate key check bypass in the constructor we iterated on different approaches, and eventually settled on passing a dummy argument. Albeit less elegant, all other approaches required getting rid of `std::make_shared` and adding an allocation *per node created*. This PR contains code from Pieter Wuille (see commits). Fixes bitcoin#25824. ACKs for top commit: darosior: ACK e8cc2e4 -- it's my own PR but most of the code here was written by sipa. I've reviewed and tested it. sipa: ACK e8cc2e4 (for the few parts of the code that aren't mine) Tree-SHA512: c21de39b3eeb484393758629882fcf8694a9bd1b8f15ae22efcec1582efc9c2309c5a0c2d90f361dd8e233d704a07dcd5fb982f4a48a002c4d8789e1d78bb526

648f695 Correct sanity-checking script_size calculation (Pieter Wuille) Pull request description: Fix a bug in the script_size sanity-check in the miniscript string parser, found by oss-fuzz in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=51636, and introduced in e8cc2e4 (#25540). This bug would cause an assertion failure when feeding a miniscript with a `thresh(k,...)` fragment, with k >= 128, to an RPC. ACKs for top commit: darosior: utACK 648f695 achow101: ACK 648f695 Tree-SHA512: d86a0721758cd1e42ef02050b542f0935efdc19447a1ca76a3ade96352a6ee8261eef3d4a5cbdec77bf0ad14dfed42e9eb6bd4246b816a9f6f06d786900da9e7

648f695 Correct sanity-checking script_size calculation (Pieter Wuille) Pull request description: Fix a bug in the script_size sanity-check in the miniscript string parser, found by oss-fuzz in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=51636, and introduced in e8cc2e4 (bitcoin#25540). This bug would cause an assertion failure when feeding a miniscript with a `thresh(k,...)` fragment, with k >= 128, to an RPC. ACKs for top commit: darosior: utACK 648f695 achow101: ACK 648f695 Tree-SHA512: d86a0721758cd1e42ef02050b542f0935efdc19447a1ca76a3ade96352a6ee8261eef3d4a5cbdec77bf0ad14dfed42e9eb6bd4246b816a9f6f06d786900da9e7

maflcko added this to the 24.0 milestone Jul 4, 2022

DrahtBot added the Tests label Jul 4, 2022

Rspigler mentioned this pull request Jul 4, 2022

Awesome multisig PR labyrinth guide #24861

Open

16 tasks

maflcko removed the Tests label Jul 4, 2022

maflcko changed the title ~~Permit delaying duplicate key check in miniscript::Node construction~~ miniscript: Permit delaying duplicate key check in miniscript::Node construction Jul 4, 2022

DrahtBot added the Tests label Jul 4, 2022

maflcko removed the Tests label Jul 4, 2022

DrahtBot added the Descriptors label Jul 4, 2022

DrahtBot mentioned this pull request Jul 4, 2022

Miniscript support in Output Descriptors #24148

Merged

darosior force-pushed the sipa_202206_fastminiscriptdup branch from 26956eb to 40e3d43 Compare July 5, 2022 08:55

darosior changed the title ~~miniscript: Permit delaying duplicate key check in miniscript::Node construction~~ miniscript: avoid wasteful computation, prevent memory blowup when fuzzing Jul 5, 2022

DrahtBot added the Needs rebase label Jul 14, 2022

DrahtBot mentioned this pull request Jul 15, 2022

Signing support for Miniscript Descriptors #24149

Merged

darosior force-pushed the sipa_202206_fastminiscriptdup branch from 40e3d43 to 088e5b0 Compare July 15, 2022 08:41

DrahtBot removed the Needs rebase label Jul 15, 2022

darosior mentioned this pull request Jul 15, 2022

descriptor_parse: extend coverage with Miniscript Descriptors inputs bitcoin-core/qa-assets#92

Merged

Sjors reviewed Jul 16, 2022

View reviewed changes

achow101 reviewed Jul 18, 2022

View reviewed changes

src/script/miniscript.h Show resolved Hide resolved

darosior force-pushed the sipa_202206_fastminiscriptdup branch from 088e5b0 to 9f92de3 Compare August 11, 2022 09:14

maflcko mentioned this pull request Aug 12, 2022

UndefinedBehaviorSanitizer: stack-overflow in miniscript (descriptor_parse) #25824

Closed

sipa reviewed Sep 12, 2022

View reviewed changes

darosior force-pushed the sipa_202206_fastminiscriptdup branch from 37f98e9 to 14a964e Compare September 15, 2022 08:17

sipa and others added 2 commits September 17, 2022 10:47

Permit delaying duplicate key check in miniscript::Node construction

4cb8f9a

Make miniscript string parsing account for exact script size as bound

e8cc2e4

Co-Authored-by: Antoine Poinsot <darosior@protonmail.com>

darosior force-pushed the sipa_202206_fastminiscriptdup branch from 14a964e to e8cc2e4 Compare September 17, 2022 13:13

fanquake requested a review from achow101 September 18, 2022 08:46

glozow merged commit 55e1deb into bitcoin:master Sep 19, 2022

sipa mentioned this pull request Sep 21, 2022

Fix assert failure in miniscript string parsing #26149

Merged

darosior deleted the sipa_202206_fastminiscriptdup branch February 26, 2023 12:49

darosior mentioned this pull request Nov 9, 2023

fuzz: Fix timeouts #28812

Closed

bitcoin locked and limited conversation to collaborators Feb 26, 2024

miniscript: avoid wasteful computation, prevent memory blowup when fuzzing #25540

miniscript: avoid wasteful computation, prevent memory blowup when fuzzing #25540

Uh oh!

Conversation

darosior commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

darosior commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

darosior commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Jul 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

darosior commented Jul 15, 2022

Uh oh!

Sjors Jul 16, 2022

Choose a reason for hiding this comment

Uh oh!

darosior Jul 19, 2022

Choose a reason for hiding this comment

Uh oh!

Sjors Jul 16, 2022

Choose a reason for hiding this comment

Uh oh!

darosior Jul 19, 2022

Choose a reason for hiding this comment

Uh oh!

achow101 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

darosior commented Aug 11, 2022

Uh oh!

achow101 commented Aug 15, 2022

Uh oh!

glozow commented Sep 8, 2022

Uh oh!

sipa Sep 12, 2022

Choose a reason for hiding this comment

Uh oh!

sipa commented Sep 12, 2022

Uh oh!

darosior commented Sep 14, 2022

Uh oh!

sipa commented Sep 14, 2022

Uh oh!

darosior commented Sep 14, 2022

Uh oh!

sipa commented Sep 14, 2022

Uh oh!

sipa commented Sep 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

darosior commented Sep 17, 2022

Uh oh!

sipa commented Sep 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

darosior commented Jul 4, 2022 •

edited

Loading

darosior commented Jul 4, 2022 •

edited

Loading

darosior commented Jul 4, 2022 •

edited

Loading

DrahtBot commented Jul 4, 2022 •

edited

Loading

sipa commented Sep 15, 2022 •

edited

Loading

sipa commented Sep 19, 2022 •

edited

Loading