-
Notifications
You must be signed in to change notification settings - Fork 37.7k
miniscript: avoid wasteful computation, prevent memory blowup when fuzzing #25540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
miniscript: avoid wasteful computation, prevent memory blowup when fuzzing #25540
Conversation
ACK 23db26f -- this change was entirely authored by Pieter Wuille (i only patched As a mean of comparison here are the runtime of the
This also made the two targets from #24149 twice as fast on average. |
miniscript::Node
constructionminiscript::Node
construction
I think we should also have a limit on the maximum number of nodes we want to tolerate when parsing. For instance we would happily parse at this moment a string so long that i can't upload it on 0bin, and run out of memory. This can be the reason for fuzz crashes too. EDIT: added a commit for this, and changed the OP. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
26956eb
to
40e3d43
Compare
miniscript::Node
construction40e3d43
to
088e5b0
Compare
Rebased after #24148 was merged. |
src/script/miniscript.h
Outdated
// The two integers are used to hold state for thresh() | ||
std::vector<std::tuple<ParseContext, int64_t, int64_t>> to_parse; | ||
std::vector<NodeRef<Key>> constructed; | ||
|
||
to_parse.emplace_back(ParseContext::WRAPPED_EXPR, -1, -1); | ||
|
||
while (!to_parse.empty()) { | ||
if (opened_frags > MAX_STANDARD_P2WSH_SCRIPT_SIZE) return {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this limit is specific to Miniscript inside P2WSH. That's the case now for our descriptor implementation. Maybe use MAX_SCRIPT_SIZE
(10,000) instead?
See also Resource Limitations under https://bitcoin.sipa.be/miniscript/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Miniscript code already is P2WSH specifc. We can change this once we implement Tapscript support in Miniscript.
//! by all constructors except the NoDupCheck ones. The NoDupCheck ones skip the | ||
//! computation, requiring it to be done manually by invoking DuplicateKeyCheck(). | ||
//! DuplicateKeyCheck(), or a non-NoDupCheck constructor, will compute has_duplicate_keys | ||
//! for all subnodes as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is useful because it allows performing other checks (like size limit) before checking for duplicates? Is so, maybe say that in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could technically be used for this but it's not the intended purpose: we only ever use it to skip redundant computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that inside of Parse
, there are 3 MakeNodeRef
calls which are doing the duplicate keys check. Is that intentional, and if so, why? These are for parsing AND_N
, ANDOR
, and THRESH
.
088e5b0
to
9f92de3
Compare
i didn't author this commit, but i don't think it was intentional. I see no reason for duplicating the checks there. I pushed a version removing them. (Sorry, i had forgotten to reply to your question.) |
ACK 9f92de3 |
Pinging @sipa @sanket1729 if you wouldn't mind taking a look? Or perhaps @stickies-v, @jb55, @Sjors since you reviewed #24147? |
src/script/miniscript.h
Outdated
@@ -1030,29 +1030,39 @@ inline NodeRef<Key> Parse(Span<const char> in, const Ctx& ctx) | |||
{ | |||
using namespace spanparsing; | |||
|
|||
// Account for the number of fragments we started to parse. Any fragment is at least one byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not true for v:
and and_v
. It probably doesn't matter, but it makes the analysis harder.
@darosior See https://github.com/sipa/bitcoin/commits/202209_miniscript_parsesize for a commit that instead of opened fragments computes the actual script size redundantly during string parsing. That's definitely overkill, but it has the advantage of being very sensitively testable. What do you think? |
@sipa your commit doesn't limit the number of I had incorrectly thought about the What do you think of a simple arbitrary limit on the number of characters in the string instead? Something like |
@darosior Good call! I think there may be another strategy too based on:
So I think a strategy could be similar to the script size counting I'm doing in my current commit, but instead "borrow" 1 byte from each leaf, and add it to the combinator that added that leaf to the count. |
I thought about something along these lines as well. By taking a lower bound of So we are left with having to conditionally account for leaf fragments, as you mention with "borrowing". I'm not sure the complexity is worth it, but we could do this on top of my current version. I've pushed a fixup commit implementing this on top of my branch here. Let me know if i'm missing something or if you think your version accurately accounting for fragments' sizes is still preferable (since it's applicable to it as well). |
@darosior See added commit in https://github.com/sipa/bitcoin/commits/202209_miniscript_parsesize |
37f98e9
to
14a964e
Compare
@darosior Not sure you saw my commit above, WDYT? It has the property that every fragment increases the
EDIT: squashed and updated comments |
Co-Authored-by: Antoine Poinsot <darosior@protonmail.com>
14a964e
to
e8cc2e4
Compare
I picked @sipa's latest commit and squashed mine with it. Although it replicates all the script size calculation, his version indeed gives better assurance that the check is in fact correct. Thanks! I ran this branch against the ACK e8cc2e4 -- it's my own PR but most of the code here was written by sipa. I've reviewed and tested it. |
ACK e8cc2e4 (for the few parts of the code that aren't mine) |
…memory blowup when fuzzing e8cc2e4 Make miniscript string parsing account for exact script size as bound (Pieter Wuille) 4cb8f9a Permit delaying duplicate key check in miniscript::Node construction (Pieter Wuille) Pull request description: As reported in bitcoin#24860 (comment), the current code to construct a `miniscript::Node` could cause a blowup on large fuzzer inputs. This is because: 1. The duplicate key check is redundantly done at parsing time, since we will recursively create miniscript nodes and the constructor will unconditionally look for duplicate across this node's keys and all its sub-nodes'. 2. We don't put an upper bound on the size of the inputs to consider for parsing. To avoid wasteful computation, and prevent the blowup on some fuzzer inputs, limit the size of reasonable inputs and only perform the check for duplicate keys once when parsing. Regarding the duplicate key check bypass in the constructor we iterated on different approaches, and eventually settled on passing a dummy argument. Albeit less elegant, all other approaches required getting rid of `std::make_shared` and adding an allocation *per node created*. This PR contains code from Pieter Wuille (see commits). Fixes bitcoin#25824. ACKs for top commit: darosior: ACK e8cc2e4 -- it's my own PR but most of the code here was written by sipa. I've reviewed and tested it. sipa: ACK e8cc2e4 (for the few parts of the code that aren't mine) Tree-SHA512: c21de39b3eeb484393758629882fcf8694a9bd1b8f15ae22efcec1582efc9c2309c5a0c2d90f361dd8e233d704a07dcd5fb982f4a48a002c4d8789e1d78bb526
648f695 Correct sanity-checking script_size calculation (Pieter Wuille) Pull request description: Fix a bug in the script_size sanity-check in the miniscript string parser, found by oss-fuzz in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=51636, and introduced in e8cc2e4 (#25540). This bug would cause an assertion failure when feeding a miniscript with a `thresh(k,...)` fragment, with k >= 128, to an RPC. ACKs for top commit: darosior: utACK 648f695 achow101: ACK 648f695 Tree-SHA512: d86a0721758cd1e42ef02050b542f0935efdc19447a1ca76a3ade96352a6ee8261eef3d4a5cbdec77bf0ad14dfed42e9eb6bd4246b816a9f6f06d786900da9e7
648f695 Correct sanity-checking script_size calculation (Pieter Wuille) Pull request description: Fix a bug in the script_size sanity-check in the miniscript string parser, found by oss-fuzz in https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=51636, and introduced in e8cc2e4 (bitcoin#25540). This bug would cause an assertion failure when feeding a miniscript with a `thresh(k,...)` fragment, with k >= 128, to an RPC. ACKs for top commit: darosior: utACK 648f695 achow101: ACK 648f695 Tree-SHA512: d86a0721758cd1e42ef02050b542f0935efdc19447a1ca76a3ade96352a6ee8261eef3d4a5cbdec77bf0ad14dfed42e9eb6bd4246b816a9f6f06d786900da9e7
As reported in #24860 (comment), the current code to construct a
miniscript::Node
could cause a blowup on large fuzzer inputs. This is because:To avoid wasteful computation, and prevent the blowup on some fuzzer inputs, limit the size of reasonable inputs and only perform the check for duplicate keys once when parsing.
Regarding the duplicate key check bypass in the constructor we iterated on different approaches, and eventually settled on passing a dummy argument. Albeit less elegant, all other approaches required getting rid of
std::make_shared
and adding an allocation per node created.This PR contains code from Pieter Wuille (see commits).
Fixes #25824.