Skip to content

Conversation

sipa
Copy link
Member

@sipa sipa commented May 2, 2023

This adds supports for fuzz targets that return a boolean: true is the normal case, while false indicates the input was uninteresting and should not under any circumstances be added to the corpus. This is intended for fuzz targets that have some early bail-out criteria, so that the fuzzer doesn't continue to iterate on them.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 2, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK dergoegge, brunoerg, darosior

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28065 (fuzz: Flatten all FUZZ_TARGET macros into one by MarcoFalke)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@fanquake fanquake requested a review from dergoegge May 2, 2023 12:17
@maflcko
Copy link
Member

maflcko commented May 2, 2023

Can you add some context to explain how this interacts with fuzzing engines? Will this make it harder for engines to start from an empty input set? Often, to find a sufficiently long input to pass basic deserialization, fuzz engines will have to be guided, for example -use_value_profile=1 for libfuzzer, and discarding the inputs on the way would mean they will never succeed passing basic deserialization?

Moreover, it could help to state a goal. Is it to keep the qa-assets repo small?

Copy link
Member

@dergoegge dergoegge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Comment on lines 35 to 48
/** Fuzz target without initialization function that returns bool (false = uninteresting test). */
#define FUZZ_PARTIAL_TARGET(name) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth noting that this will only work for libFuzzer? (or i guess any engine that uses the libFuzzer harness and respects the -1 return value)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it a bit more abstract: the macro is for writing a test that has such a return value. Whether the fuzz infrastructure uses it in an independent question (and if there are ones using LLVMFuzzerTestOneInput that don't support the return value -1 at all, we should make sure it also isn't returned, even if FUZZ_PARTIAL_TARGET is used).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a FuzzResult enum as suggested by @MarcoFalke, and added some explanation there.

test_one_input({data, size});
return 0;
/* Returning -1 means the input was not useful. */
return int{test_one_input({data, size})} - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that this was only recently added to libFuzzer: https://reviews.llvm.org/D128749?id=441094

I think that is fine, running with older versions of libFuzzer makes little sense anyway.

@sipa
Copy link
Member Author

sipa commented May 2, 2023

@MarcoFalke Fair question. I think the primary advantage is that it should help with the speed of fuzzing, by avoiding spending time on less interesting things. It is however somewhat delicate as you point out - if you mark too many things as "uninteresting", I can imagine it actually becomes harder to find a mutation path from one interesting test case to another.

@maflcko
Copy link
Member

maflcko commented May 2, 2023

Yeah, it may help or hurt, depending on your goal and the fuzz target. My recommendation would be to make this off by default, and add an option to enable it at run time. This certainly can't hurt and may help for the use cases that want to enable it.

test_one_input({data, size});
return 0;
/* Returning -1 means the input was not useful. */
return int{test_one_input({data, size})} - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion, if you want to go down the route to make this a runtime option:

static const reject_unwated_inputs{std::getenv("REJECT_UNWANTED_FUZZ_INPUTS")};

(or similar)

@sipa
Copy link
Member Author

sipa commented May 2, 2023

@MarcoFalke Perhaps, but I don't worry too much if it's used conservatively. The "having to go through uninteresting cases to get to interesting ones" is a concern with or without this functionality, because after all, the uninteresting cases are already unlikely to trigger much (useful) coverage, and the coverage that they do trigger is likely unrelated to what is interesting. The actual solution libfuzzer has for this concern is attempting multiple (up to 5, IIRC) mutations in one step.

Of course, (over)use of this feature may make things worse, but that's up to the individual tests.

Maybe it's worth experimenting a bit with to so how much impact it has; e.g. introduce old/known bugs into the code, start from an empty corpus, and measure on average how long in time it takes to find the bug, with and without this. The miniscript fuzzers (where I've added return false;s relatively liberally in this PR) could be a good guinea pig.

Copy link
Contributor

@mzumsande mzumsande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth experimenting a bit with to so how much impact it has;

Yes, I'm planning to play with that, I'd be really interested in whether there is a significant speedup.

I feel like ideally, this would be something a good fuzzing engine should be able to handle to some extent without user guidance - uninteresting cases should create fewer additional seeds added to the corpus, which should result in them being picked by the engine for mutation less (and there might be more sophisticated algorithms that would further reduce the time spent on seeds that have failed to create interesting mutations before). That wouldn't drive the time spent on these uninteresting inputs down to zero like the approach here though.

@sipa sipa force-pushed the 202305_partial_fuzzers branch from 44c6991 to 87e0cc2 Compare July 5, 2023 15:03
@sipa
Copy link
Member Author

sipa commented Jul 5, 2023

Rebased and switched from bool to an enum class FuzzResult which has values MAYBE_INTERESTING and UNINTERESTING, making it hopefully clearer what the return values correspond to.

@brunoerg
Copy link
Contributor

Concept ACK

std::vector<bool> asmap = ipv6 ? IPV6_PREFIX_ASMAP : IPV4_PREFIX_ASMAP;
asmap.reserve(asmap.size() + 8 * asmap_size);
for (int i = 0; i < asmap_size; ++i) {
for (int j = 0; j < 8; ++j) {
asmap.push_back((buffer[1 + i] >> j) & 1);
}
}
if (!SanityCheckASMap(asmap, 128)) return;
if (!SanityCheckASMap(asmap, 128)) return FuzzResult::MAYBE_INTERESTING;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!SanityCheckASMap(asmap, 128)) return FuzzResult::MAYBE_INTERESTING;
if (!SanityCheckASMap(asmap, 128)) return FuzzResult::UNINTERESTING;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is intentional to collect fuzz inputs that fail SanityCheckASMap into the qa-assets directory.

@brunoerg
Copy link
Contributor

brunoerg commented Jul 10, 2023

I did a quick test. I suppose that with this new approach, miniscript_script corpus will contain only valid miniscripts, this sounds good. So, I first ran: FUZZ=miniscript_script src/test/fuzz/fuzz new_corpus -runs=1000000. And then I "fuzzed" decodescript RPC using the following script I created:

#!/usr/bin/env python3

import sys
sys.path.insert(0, "/path/to/test/functional")
from test_framework.test_shell import TestShell

import binascii
import os

def miniscript():
    dirc = '/path/to/corpus/'
    test = TestShell().setup(num_nodes=1, setup_clean_chain=True)
    node = test.nodes[0]
    for file in os.listdir(dirc):
        with open(os.path.join(dirc, file), 'rb') as f:
            byte_data = f.read()
            hex_string = binascii.hexlify(byte_data).decode('utf-8')
            res = node.decodescript(hex_string)
            print(res)

    test.shutdown()

if __name__ == '__main__':
    miniscript()

It worked as expected.


For specific cases I think this approach (indicating usefulness) may be useful, for other ones it may be "dangerous". We can do a simple mutation in an interesting result and it may become an uninteresting one, then we can do another mutation and it becomes an interesting one - different from the first case.


I did other test to evaluate this approach in other scenario.

In addrman harness, we have:

const AddrMan& const_addr_man{addr_man};
(void)const_addr_man.GetAddr(
    /*max_addresses=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*max_pct=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*network=*/std::nullopt);
(void)const_addr_man.Select(fuzzed_data_provider.ConsumeBool());
(void)const_addr_man.Size();

I believe that calling GetAddr, Select and other functions may not be so useful if the addrman is empty. So, using "partial" fuzzers, we could do:

if (addr_man.Size() == 0) return FuzzResult::UNINTERESTING;
const AddrMan& const_addr_man{addr_man};
(void)const_addr_man.GetAddr(
    /*max_addresses=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*max_pct=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*network=*/std::nullopt);
(void)const_addr_man.Select(fuzzed_data_provider.ConsumeBool());
(void)const_addr_man.Size();
CDataStream data_stream(SER_NETWORK, PROTOCOL_VERSION);
data_stream << const_addr_man;
if (addr_man.Size() == 0) return FuzzResult::MAYBE_INTERESTING;

I ran it and the result was:

-➜  bitcoin-core-dev git:(27552-sipa) ✗ FUZZ=addrman src/test/fuzz/fuzz -runs=100000 -print_final_stats=1
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 3233727686
INFO: Loaded 1 modules   (1141182 inline 8-bit counters): 1141182 [0x107f53780, 0x10806a13e), 
INFO: Loaded 1 PC tables (1141182 PCs): 1141182 [0x10806a140,0x1091d3d20), 
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2      INITED exec/s: 0 rss: 105Mb
WARNING: no interesting inputs were found so far. Is the code instrumented for coverage?
This may also happen if the target rejected all inputs we tried so far
#100000 DONE   corp: 1/1b lim: 994 exec/s: 3225 rss: 422Mb
Done 100000 runs in 31 second(s)
stat::number_of_executed_units: 100000
stat::average_exec_per_sec:     3225
stat::new_units_added:          0
stat::slowest_unit_time_sec:    0
stat::peak_rss_mb:              422

However, if instead of doing that, we just do a "return" if addrman is empty, would it be more effective? E.g.

if (addr_man.Size() == 0) return;
else assert(false);
const AddrMan& const_addr_man{addr_man};
(void)const_addr_man.GetAddr(
    /*max_addresses=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*max_pct=*/fuzzed_data_provider.ConsumeIntegralInRange<size_t>(0, 4096),
    /*network=*/std::nullopt);
(void)const_addr_man.Select(fuzzed_data_provider.ConsumeBool());
(void)const_addr_man.Size();
CDataStream data_stream(SER_NETWORK, PROTOCOL_VERSION);
data_stream << const_addr_man;

note: I added an assert to crash as soon as addrman is not empty anymore.

I ran same command that I did previously and the result was:

SUMMARY: libFuzzer: deadly signal
MS: 5 ChangeBit-ChangeBinInt-ChangeASCIIInt-CrossOver-CopyPart-; base unit: 7cf855d3c582971ea888061d02610fe375f68776
0x5c,0xff,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x26,0xff,0x52,0x44,0x36,0xff,0xff,0x0,0x0,0x0,0x49,0x5c,0xff,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x7a,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x3f,0x41,0x3f,0x54,0x7e,0x54,0x8f,0x41,
\\\377zzzzzzzzz\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000&\377RD6\377\377\000\000\000I\\\377zzzzzzzzz\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000?A?T~T\217A
artifact_prefix='./'; Test unit written to ./crash-cf486b4859d3e46c3591f9a71e2f83dc384d3987
Base64: XP96enp6enp6enoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJv9SRDb//wAAAElc/3p6enp6enp6egAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP0E/VH5Uj0E=
stat::number_of_executed_units: 38666
stat::average_exec_per_sec:     1933
stat::new_units_added:          261
stat::slowest_unit_time_sec:    0
stat::peak_rss_mb:              351

It executed 38666 units and crashed because addrman is not empty anymore. It seemed to be more effective.

#define FUZZ_TARGET(name) \
FUZZ_TARGET_INIT(name, FuzzFrameworkEmptyInitFun)

/** Fuzz target without initialization function that returns FuzzResult. */
#define FUZZ_PARTIAL_TARGET(name) \
Copy link
Member

@maflcko maflcko Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (This is my fault)

Not really a fan adding more macros, where each new option will cause doubling all existing macros. Currently there are 3, in this pull there are 6, and with the next option we'll have 12 to 16 macros?

At least for the existing options, which only need to be known at runtime, an options struct can be used.

See #28065 . Feel free to ignore/NACK.

Edit: To clarify having FUZZ_TARGET and FUZZ_PARTIAL_TARGET is probably fine. My comment was about the other macros in other lines.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but would be good to test this before merge


const auto ms = miniscript::FromScript(*script, SCRIPT_PARSER_CONTEXT);
if (!ms) return;
if (!ms) return FuzzResult::UNINTERESTING;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will discard all cases where miniscript::FromScript fails? This seems undesirable, because then someone can change the code to add undefined behavior or a crash in code paths that return an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Marco. My first reaction was hey we can't have our cake and eat it too, but in the case of the Miniscript targets we can: miniscript_script and miniscript_string could be left more generic by not discarding any coverage while miniscript_smart and miniscript_stable would.

{
FuzzedDataProvider provider(buffer.data(), buffer.size());
auto str = provider.ConsumeRemainingBytesAsString();
auto parsed = miniscript::FromString(str, PARSER_CTX);
if (!parsed) return;
if (!parsed) return FuzzResult::UNINTERESTING;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same?

*
* libfuzzer can make use of this and will not insert the input in its corpus, even when it
* appears to increase coverage. */
UNINTERESTING
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UNINTERESTING
UNINTERESTING,

Style nit: Missing comma to avoid having to touch this line again if a new value is added (unlikely).

return 0;
auto result = test_one_input({data, size});
/* Returning -1 means the input was not useful. */
return (result != FuzzResult::UNINTERESTING) - 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: May be better to use a switch-case to avoid missing a case, when a new value is added (unlikely)?

std::vector<bool> asmap = ipv6 ? IPV6_PREFIX_ASMAP : IPV4_PREFIX_ASMAP;
asmap.reserve(asmap.size() + 8 * asmap_size);
for (int i = 0; i < asmap_size; ++i) {
for (int j = 0; j < 8; ++j) {
asmap.push_back((buffer[1 + i] >> j) & 1);
}
}
if (!SanityCheckASMap(asmap, 128)) return;
if (!SanityCheckASMap(asmap, 128)) return FuzzResult::MAYBE_INTERESTING;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is intentional to collect fuzz inputs that fail SanityCheckASMap into the qa-assets directory.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Copy link
Member

@darosior darosior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

@dergoegge
Copy link
Member

Rebased this past #28065 here: https://github.com/dergoegge/bitcoin/tree/202305_partial_fuzzers

@achow101
Copy link
Member

Closing as up for grabs due to lack of activity.

@sipa
Copy link
Member Author

sipa commented Sep 20, 2023

I believe this is interesting, but to move forward it needs benchmarks (in terms of seeing how practical use of this increases/decreases figuring out bugs, perhaps intentionally added one), which I don't have the intent to work on in the short term currently.

@darosior
Copy link
Member

I'm happy to review if someone picks this up.

@brunoerg
Copy link
Contributor

I'm interesting on it. I can pick this up.

@Abuchtela
Copy link

Rebase

@bitcoin bitcoin locked and limited conversation to collaborators Nov 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants