Add support for "partial" fuzzers that indicate usefulness #27552

sipa · 2023-05-02T12:13:48Z

This adds supports for fuzz targets that return a boolean: true is the normal case, while false indicates the input was uninteresting and should not under any circumstances be added to the corpus. This is intended for fuzz targets that have some early bail-out criteria, so that the fuzzer doesn't continue to iterate on them.

DrahtBot · 2023-05-02T12:13:51Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	dergoegge, brunoerg, darosior

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28065 (fuzz: Flatten all FUZZ_TARGET macros into one by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

maflcko · 2023-05-02T12:26:32Z

Can you add some context to explain how this interacts with fuzzing engines? Will this make it harder for engines to start from an empty input set? Often, to find a sufficiently long input to pass basic deserialization, fuzz engines will have to be guided, for example -use_value_profile=1 for libfuzzer, and discarding the inputs on the way would mean they will never succeed passing basic deserialization?

Moreover, it could help to state a goal. Is it to keep the qa-assets repo small?

dergoegge

Concept ACK

dergoegge · 2023-05-02T12:33:53Z

src/test/fuzz/fuzz.h

+/** Fuzz target without initialization function that returns bool (false = uninteresting test). */
+#define FUZZ_PARTIAL_TARGET(name) \


Maybe worth noting that this will only work for libFuzzer? (or i guess any engine that uses the libFuzzer harness and respects the -1 return value)

I see it a bit more abstract: the macro is for writing a test that has such a return value. Whether the fuzz infrastructure uses it in an independent question (and if there are ones using LLVMFuzzerTestOneInput that don't support the return value -1 at all, we should make sure it also isn't returned, even if FUZZ_PARTIAL_TARGET is used).

I've added a FuzzResult enum as suggested by @MarcoFalke, and added some explanation there.

dergoegge · 2023-05-02T12:36:52Z

src/test/fuzz/fuzz.cpp

-    test_one_input({data, size});
-    return 0;
+    /* Returning -1 means the input was not useful. */
+    return int{test_one_input({data, size})} - 1;


Just noting that this was only recently added to libFuzzer: https://reviews.llvm.org/D128749?id=441094

I think that is fine, running with older versions of libFuzzer makes little sense anyway.

sipa · 2023-05-02T12:43:04Z

@MarcoFalke Fair question. I think the primary advantage is that it should help with the speed of fuzzing, by avoiding spending time on less interesting things. It is however somewhat delicate as you point out - if you mark too many things as "uninteresting", I can imagine it actually becomes harder to find a mutation path from one interesting test case to another.

maflcko · 2023-05-02T12:49:47Z

Yeah, it may help or hurt, depending on your goal and the fuzz target. My recommendation would be to make this off by default, and add an option to enable it at run time. This certainly can't hurt and may help for the use cases that want to enable it.

maflcko · 2023-05-02T12:55:43Z

src/test/fuzz/fuzz.cpp

-    test_one_input({data, size});
-    return 0;
+    /* Returning -1 means the input was not useful. */
+    return int{test_one_input({data, size})} - 1;


Suggestion, if you want to go down the route to make this a runtime option:

static const reject_unwated_inputs{std::getenv("REJECT_UNWANTED_FUZZ_INPUTS")};

(or similar)

sipa · 2023-05-02T13:01:13Z

@MarcoFalke Perhaps, but I don't worry too much if it's used conservatively. The "having to go through uninteresting cases to get to interesting ones" is a concern with or without this functionality, because after all, the uninteresting cases are already unlikely to trigger much (useful) coverage, and the coverage that they do trigger is likely unrelated to what is interesting. The actual solution libfuzzer has for this concern is attempting multiple (up to 5, IIRC) mutations in one step.

Of course, (over)use of this feature may make things worse, but that's up to the individual tests.

Maybe it's worth experimenting a bit with to so how much impact it has; e.g. introduce old/known bugs into the code, start from an empty corpus, and measure on average how long in time it takes to find the bug, with and without this. The miniscript fuzzers (where I've added return false;s relatively liberally in this PR) could be a good guinea pig.

mzumsande

Maybe it's worth experimenting a bit with to so how much impact it has;

Yes, I'm planning to play with that, I'd be really interested in whether there is a significant speedup.

I feel like ideally, this would be something a good fuzzing engine should be able to handle to some extent without user guidance - uninteresting cases should create fewer additional seeds added to the corpus, which should result in them being picked by the engine for mutation less (and there might be more sophisticated algorithms that would further reduce the time spent on seeds that have failed to create interesting mutations before). That wouldn't drive the time spent on these uninteresting inputs down to zero like the approach here though.

src/test/fuzz/asmap.cpp

sipa · 2023-07-05T15:04:24Z

Rebased and switched from bool to an enum class FuzzResult which has values MAYBE_INTERESTING and UNINTERESTING, making it hopefully clearer what the return values correspond to.

brunoerg · 2023-07-10T14:17:52Z

Concept ACK

brunoerg · 2023-07-10T20:08:29Z

src/test/fuzz/asmap.cpp

    std::vector<bool> asmap = ipv6 ? IPV6_PREFIX_ASMAP : IPV4_PREFIX_ASMAP;
    asmap.reserve(asmap.size() + 8 * asmap_size);
    for (int i = 0; i < asmap_size; ++i) {
        for (int j = 0; j < 8; ++j) {
            asmap.push_back((buffer[1 + i] >> j) & 1);
        }
    }
-    if (!SanityCheckASMap(asmap, 128)) return;
+    if (!SanityCheckASMap(asmap, 128)) return FuzzResult::MAYBE_INTERESTING;


Suggested change

if (!SanityCheckASMap(asmap, 128)) return FuzzResult::MAYBE_INTERESTING;

if (!SanityCheckASMap(asmap, 128)) return FuzzResult::UNINTERESTING;

I think this is intentional to collect fuzz inputs that fail SanityCheckASMap into the qa-assets directory.

brunoerg · 2023-07-10T22:21:41Z

I did a quick test. I suppose that with this new approach, miniscript_script corpus will contain only valid miniscripts, this sounds good. So, I first ran: FUZZ=miniscript_script src/test/fuzz/fuzz new_corpus -runs=1000000. And then I "fuzzed" decodescript RPC using the following script I created:

#!/usr/bin/env python3

import sys
sys.path.insert(0, "/path/to/test/functional")
from test_framework.test_shell import TestShell

import binascii
import os

def miniscript():
    dirc = '/path/to/corpus/'
    test = TestShell().setup(num_nodes=1, setup_clean_chain=True)
    node = test.nodes[0]
    for file in os.listdir(dirc):
        with open(os.path.join(dirc, file), 'rb') as f:
            byte_data = f.read()
            hex_string = binascii.hexlify(byte_data).decode('utf-8')
            res = node.decodescript(hex_string)
            print(res)

    test.shutdown()

if __name__ == '__main__':
    miniscript()

It worked as expected.

For specific cases I think this approach (indicating usefulness) may be useful, for other ones it may be "dangerous". We can do a simple mutation in an interesting result and it may become an uninteresting one, then we can do another mutation and it becomes an interesting one - different from the first case.

I did other test to evaluate this approach in other scenario.

In addrman harness, we have:

const AddrMan& const_addr_man{addr_man};
(void)const_addr_man.GetAddr(
    /*max_addresses=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*max_pct=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*network=*/std::nullopt);
(void)const_addr_man.Select(fuzzed_data_provider.ConsumeBool());
(void)const_addr_man.Size();

I believe that calling GetAddr, Select and other functions may not be so useful if the addrman is empty. So, using "partial" fuzzers, we could do:

if (addr_man.Size() == 0) return FuzzResult::UNINTERESTING;
const AddrMan& const_addr_man{addr_man};
(void)const_addr_man.GetAddr(
    /*max_addresses=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*max_pct=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*network=*/std::nullopt);
(void)const_addr_man.Select(fuzzed_data_provider.ConsumeBool());
(void)const_addr_man.Size();
CDataStream data_stream(SER_NETWORK, PROTOCOL_VERSION);
data_stream << const_addr_man;
if (addr_man.Size() == 0) return FuzzResult::MAYBE_INTERESTING;

I ran it and the result was:

-➜  bitcoin-core-dev git:(27552-sipa) ✗ FUZZ=addrman src/test/fuzz/fuzz -runs=100000 -print_final_stats=1
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 3233727686
INFO: Loaded 1 modules   (1141182 inline 8-bit counters): 1141182 [0x107f53780, 0x10806a13e), 
INFO: Loaded 1 PC tables (1141182 PCs): 1141182 [0x10806a140,0x1091d3d20), 
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2      INITED exec/s: 0 rss: 105Mb
WARNING: no interesting inputs were found so far. Is the code instrumented for coverage?
This may also happen if the target rejected all inputs we tried so far
#100000 DONE   corp: 1/1b lim: 994 exec/s: 3225 rss: 422Mb
Done 100000 runs in 31 second(s)
stat::number_of_executed_units: 100000
stat::average_exec_per_sec:     3225
stat::new_units_added:          0
stat::slowest_unit_time_sec:    0
stat::peak_rss_mb:              422

However, if instead of doing that, we just do a "return" if addrman is empty, would it be more effective? E.g.

if (addr_man.Size() == 0) return;
else assert(false);
const AddrMan& const_addr_man{addr_man};
(void)const_addr_man.GetAddr(
    /*max_addresses=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*max_pct=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*network=*/std::nullopt);
(void)const_addr_man.Select(fuzzed_data_provider.ConsumeBool());
(void)const_addr_man.Size();
CDataStream data_stream(SER_NETWORK, PROTOCOL_VERSION);
data_stream << const_addr_man;

note: I added an assert to crash as soon as addrman is not empty anymore.

I ran same command that I did previously and the result was:

SUMMARY: libFuzzer: deadly signal
MS: 5 ChangeBit-ChangeBinInt-ChangeASCIIInt-CrossOver-CopyPart-; base unit: 7cf855d3c582971ea888061d02610fe375f68776
0x5c,0xff,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x26,0xff,0x52,0x44,0x36,0xff,0xff,0x0,0x0,0x0,0x49,0x5c,0xff,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x3f,0x41,0x3f,0x54,0x7e,0x54,0x8f,0x41,
\\\377zzzzzzzzz\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000&\377RD6\377\377\000\000\000I\\\377zzzzzzzzz\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000?A?T~T\217A
artifact_prefix='./'; Test unit written to ./crash-cf486b4859d3e46c3591f9a71e2f83dc384d3987
Base64: XP96enp6enp6enoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJv9SRDb//wAAAElc/3p6enp6enp6egAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP0E/VH5Uj0E=
stat::number_of_executed_units: 38666
stat::average_exec_per_sec:     1933
stat::new_units_added:          261
stat::slowest_unit_time_sec:    0
stat::peak_rss_mb:              351

It executed 38666 units and crashed because addrman is not empty anymore. It seemed to be more effective.

maflcko · 2023-07-11T12:25:42Z

src/test/fuzz/fuzz.h

 #define FUZZ_TARGET(name) \
    FUZZ_TARGET_INIT(name, FuzzFrameworkEmptyInitFun)

+/** Fuzz target without initialization function that returns FuzzResult. */
+#define FUZZ_PARTIAL_TARGET(name) \


nit: (This is my fault)

Not really a fan adding more macros, where each new option will cause doubling all existing macros. Currently there are 3, in this pull there are 6, and with the next option we'll have 12 to 16 macros?

At least for the existing options, which only need to be known at runtime, an options struct can be used.

See #28065 . Feel free to ignore/NACK.

Edit: To clarify having FUZZ_TARGET and FUZZ_PARTIAL_TARGET is probably fine. My comment was about the other macros in other lines.

maflcko

lgtm, but would be good to test this before merge

maflcko · 2023-07-11T17:51:33Z

src/test/fuzz/miniscript.cpp


    const auto ms = miniscript::FromScript(*script, SCRIPT_PARSER_CONTEXT);
-    if (!ms) return;
+    if (!ms) return FuzzResult::UNINTERESTING;


This will discard all cases where miniscript::FromScript fails? This seems undesirable, because then someone can change the code to add undefined behavior or a crash in code paths that return an error.

I agree with Marco. My first reaction was hey we can't have our cake and eat it too, but in the case of the Miniscript targets we can: miniscript_script and miniscript_string could be left more generic by not discarding any coverage while miniscript_smart and miniscript_stable would.

maflcko · 2023-07-11T17:52:38Z

src/test/fuzz/miniscript.cpp

 {
    FuzzedDataProvider provider(buffer.data(), buffer.size());
    auto str = provider.ConsumeRemainingBytesAsString();
    auto parsed = miniscript::FromString(str, PARSER_CTX);
-    if (!parsed) return;
+    if (!parsed) return FuzzResult::UNINTERESTING;


maflcko · 2023-07-11T17:55:36Z

src/test/fuzz/fuzz.h

+     *
+     * libfuzzer can make use of this and will not insert the input in its corpus, even when it
+     * appears to increase coverage. */
+    UNINTERESTING


Suggested change

UNINTERESTING

UNINTERESTING,

Style nit: Missing comma to avoid having to touch this line again if a new value is added (unlikely).

maflcko · 2023-07-11T17:57:33Z

src/test/fuzz/fuzz.cpp

-    return 0;
+    auto result = test_one_input({data, size});
+    /* Returning -1 means the input was not useful. */
+    return (result != FuzzResult::UNINTERESTING) - 1;


style nit: May be better to use a switch-case to avoid missing a case, when a new value is added (unlikely)?

maflcko · 2023-07-11T17:59:23Z

src/test/fuzz/asmap.cpp

    std::vector<bool> asmap = ipv6 ? IPV6_PREFIX_ASMAP : IPV4_PREFIX_ASMAP;
    asmap.reserve(asmap.size() + 8 * asmap_size);
    for (int i = 0; i < asmap_size; ++i) {
        for (int j = 0; j < 8; ++j) {
            asmap.push_back((buffer[1 + i] >> j) & 1);
        }
    }
-    if (!SanityCheckASMap(asmap, 128)) return;
+    if (!SanityCheckASMap(asmap, 128)) return FuzzResult::MAYBE_INTERESTING;


I think this is intentional to collect fuzz inputs that fail SanityCheckASMap into the qa-assets directory.

DrahtBot · 2023-07-17T13:03:58Z

🐙 This pull request conflicts with the target branch and needs rebase.

darosior

Concept ACK

dergoegge · 2023-08-24T13:25:03Z

Rebased this past #28065 here: https://github.com/dergoegge/bitcoin/tree/202305_partial_fuzzers

achow101 · 2023-09-20T16:22:59Z

Closing as up for grabs due to lack of activity.

sipa · 2023-09-20T16:24:35Z

I believe this is interesting, but to move forward it needs benchmarks (in terms of seeing how practical use of this increases/decreases figuring out bugs, perhaps intentionally added one), which I don't have the intent to work on in the short term currently.

darosior · 2023-09-20T16:39:27Z

I'm happy to review if someone picks this up.

brunoerg · 2023-09-20T17:03:43Z

I'm interesting on it. I can pick this up.

Abuchtela · 2023-11-11T02:13:37Z

Rebase

fanquake requested a review from dergoegge May 2, 2023 12:17

dergoegge reviewed May 2, 2023

View reviewed changes

maflcko reviewed May 2, 2023

View reviewed changes

mzumsande reviewed May 2, 2023

View reviewed changes

maflcko reviewed May 2, 2023

View reviewed changes

src/test/fuzz/asmap.cpp Outdated Show resolved Hide resolved

DrahtBot mentioned this pull request May 17, 2023

MiniTapscript: port Miniscript to Tapscript #27255

Merged

maflcko mentioned this pull request Jun 20, 2023

fuzz: addrman, avoid ConsumeDeserializable when possible #27918

Merged

sipa added 5 commits July 5, 2023 10:51

Make TypeTestOneInput return FuzzResult enum

f048c7c

Add macros for fuzz targets that return FuzzResult

4b89ba6

Convert miniscript fuzz tests to return FuzzResult

4c5ba3e

Convert asmap fuzz test to return FuzzResult

f8514ea

Convert asmap_direct to return FuzzResult

87e0cc2

sipa force-pushed the 202305_partial_fuzzers branch from 44c6991 to 87e0cc2 Compare July 5, 2023 15:03

brunoerg reviewed Jul 10, 2023

View reviewed changes

maflcko reviewed Jul 11, 2023

View reviewed changes

DrahtBot mentioned this pull request Jul 11, 2023

fuzz: Flatten all FUZZ_TARGET macros into one #28065

Merged

DrahtBot added the Needs rebase label Jul 17, 2023

darosior reviewed Jul 18, 2023

View reviewed changes

achow101 closed this Sep 20, 2023

achow101 added the Up for grabs label Sep 20, 2023

brunoerg mentioned this pull request Nov 7, 2023

fuzz: Avoid timeout and bloat in fuzz targets #28815

Merged

maflcko mentioned this pull request Jul 22, 2024

fuzz: Limit parse_univalue input length #30473

Merged

bitcoin locked and limited conversation to collaborators Nov 10, 2024

		/** Fuzz target without initialization function that returns bool (false = uninteresting test). */
		#define FUZZ_PARTIAL_TARGET(name) \

	if (!SanityCheckASMap(asmap, 128)) return FuzzResult::MAYBE_INTERESTING;
	if (!SanityCheckASMap(asmap, 128)) return FuzzResult::UNINTERESTING;

Add support for "partial" fuzzers that indicate usefulness #27552

Add support for "partial" fuzzers that indicate usefulness #27552

Uh oh!

Conversation

sipa commented May 2, 2023

Uh oh!

DrahtBot commented May 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews

Conflicts

Uh oh!

maflcko commented May 2, 2023

Uh oh!

dergoegge left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sipa commented May 2, 2023

Uh oh!

maflcko commented May 2, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sipa commented May 2, 2023

Uh oh!

mzumsande left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sipa commented Jul 5, 2023

Uh oh!

brunoerg commented Jul 10, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brunoerg commented Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko Jul 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Jul 17, 2023

Uh oh!

darosior left a comment

Choose a reason for hiding this comment

Uh oh!

dergoegge commented Aug 24, 2023

Uh oh!

achow101 commented Sep 20, 2023

Uh oh!

sipa commented Sep 20, 2023

Uh oh!

darosior commented Sep 20, 2023

Uh oh!

brunoerg commented Sep 20, 2023

DrahtBot commented May 2, 2023 •

edited

Loading

brunoerg commented Jul 10, 2023 •

edited

Loading

maflcko Jul 11, 2023 •

edited

Loading