-
Notifications
You must be signed in to change notification settings - Fork 37.7k
fuzz: execute each file in dir without fuzz engine #21496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Oh, didn't realize that I guess the alternative is to either wait for the switch or to go with the boost approach for iterating through files in a directory? |
Concept ACK. Not sure if there's much point to passing in a single file (as opposed to |
Might be good to update |
It wont be available for general use in any bitcoind code for a while yet. @MarcoFalke is your intention to start using |
This will be build in the regular "non-fuzz-engine" target, so it can't yet use std::fs (my bad). |
Concept ACK It would be nice if multiple files or directories could be specified in order to mimic the libFuzzer command-line interface. I'm thinking something like this:
This would allow for picking inputs via shell expanded wildcards. |
Great feedback so far, thanks all! I'll swap out filesystem for boost, handle consecutive dir/file args, look into whats going on with |
I think the |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Sorry about the delay on this. Updated the code and the PR description to include boost, multiple arguments that can be either files or directories (including wildcard support), as well as some asserts in the file readings. Ignored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to work fine for me; some comments below.
src/test/fuzz/fuzz.cpp
Outdated
return false; | ||
do { | ||
const size_t length = fread(buffer, sizeof(uint8_t), sizeof(buffer), f); | ||
Assert(!ferror(f)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (ferror(f)) return false;
? (Same behaviour as fopen failure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, the assert is done in the parent method. Changed.
if (fs::is_directory(seed_path)) { | ||
for (fs::directory_iterator it(seed_path); it != fs::directory_iterator(); ++it) { | ||
if (!fs::is_regular_file(it->path())) continue; | ||
Assert(read_file(it->path(), buffer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!is_regular_file(..)) { std::cerr << "Not a file: " << it->path() << endl; abort(); }
?
if (!read_file(..)) { std::cerr << "Error reading file: " << it->path() << endl; abort(); }
?
Seems like it would be helpful to know what file is causing a problem. Not sure that skipping non-regular files rather than erroring on them makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, would be helpful to know what file is causing a problem if the fuzz testing fails too. One way to do that would be to use a signal handler to catch SIGABRT, and pass the filename via a global, something like:
#if defined(PROVIDE_FUZZ_MAIN_FUNCTION) && !defined(__AFL_LOOP)
fs::path g_seed_path;
void signal_handler(int signal)
{
if (signal == SIGABRT) {
std::cerr << "Error processing seed " << g_seed_path << std::endl;
} else {
std::cerr << "Unexpected signal " << signal << " received\n";
}
std::_Exit(EXIT_FAILURE);
}
#endif
...
auto previous_handler = std::signal(SIGABRT, signal_handler);
...
g_seed_path = seed_path;
Assert(read_file(seed_path, buffer));
test_one_input(buffer);
(EDIT: maybe these are better for a followup PR, if thinking about this stuff would delaying this one too much)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion, otherwise how is one to know if there's an issue. Added the suggested code!
Assert(read_file(it->path(), buffer)); | ||
test_one_input(buffer); | ||
buffer.clear(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth reporting the number of tests run when all the inputs succeed? It's a bit unclear that anything happened otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a counter
src/test/fuzz/fuzz.cpp
Outdated
{ | ||
uint8_t buffer[1024]; | ||
FILE* f = fsbridge::fopen(p.string(), "rb"); | ||
if (f == nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (f == nullptr) | |
if (f == nullptr) { |
According to style-guide
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I actually went with
if (f == nullptr) return false;
if that's alright instead, to be consistent with one liners.
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
@AnthonyRonning friendly ping to rebase. |
Marked "up for grabs" for now. |
f59bee3 fuzz: execute each file in dir without fuzz engine (Anthony Towns) Pull request description: Phony fuzzing (phuzzing)! Run the fuzz testing code against known inputs to detect errors. Advantage is you can easily test using the existing qa-assets datasets without having to compile with fuzzing enabled; disadvantage is that it doesn't do any actual fuzzing. Example usage: ``` $ for a in ${QA_ASSETS}/fuzz_seed_corpus/*; do echo ${a##*/}; done | xargs -P8 -I {} /bin/sh -c "FUZZ={} test/fuzz/fuzz ${QA_ASSETS}/fuzz_seed_corpus/{}" No fuzzer for address_deserialize. No fuzzer for addrdb. No fuzzer for banentry_deserialize. addition_overflow: succeeded against 848 files in 0s. asmap: succeeded against 981 files in 0s. checkqueue: succeeded against 211 files in 0s. ... ``` (`-P8` says run 8 of the tasks in parallel) If there are failures, the first one will be reported and the program will abort with output like: ``` fuzz: test/fuzz/versionbits.cpp:336: void (anonymous namespace)::versionbits_fuzz_target(FuzzBufferType): Assertion `exp_state != ThresholdState::FAILED' failed. Error processing seed "corpus/versionbits/35345ae8e722234095810b1117a29b63af7621af" ``` Rebase of bitcoin#22763, which was a rebase of bitcoin#21496, but also reports the name of the fuzzer and the time taken. Fixes bitcoin#21461 Top commit has no ACKs. Tree-SHA512: d8d046d4a309652eb13de42116276bf992480bc887ad3535a8ff18b354cb24826bc562b06af63802ec945c637f046563b6a5601d6321b46a5543127daafea09b
f59bee3 fuzz: execute each file in dir without fuzz engine (Anthony Towns) Pull request description: Phony fuzzing (phuzzing)! Run the fuzz testing code against known inputs to detect errors. Advantage is you can easily test using the existing qa-assets datasets without having to compile with fuzzing enabled; disadvantage is that it doesn't do any actual fuzzing. Example usage: ``` $ for a in ${QA_ASSETS}/fuzz_seed_corpus/*; do echo ${a##*/}; done | xargs -P8 -I {} /bin/sh -c "FUZZ={} test/fuzz/fuzz ${QA_ASSETS}/fuzz_seed_corpus/{}" No fuzzer for address_deserialize. No fuzzer for addrdb. No fuzzer for banentry_deserialize. addition_overflow: succeeded against 848 files in 0s. asmap: succeeded against 981 files in 0s. checkqueue: succeeded against 211 files in 0s. ... ``` (`-P8` says run 8 of the tasks in parallel) If there are failures, the first one will be reported and the program will abort with output like: ``` fuzz: test/fuzz/versionbits.cpp:336: void (anonymous namespace)::versionbits_fuzz_target(FuzzBufferType): Assertion `exp_state != ThresholdState::FAILED' failed. Error processing seed "corpus/versionbits/35345ae8e722234095810b1117a29b63af7621af" ``` Rebase of bitcoin#22763, which was a rebase of bitcoin#21496, but also reports the name of the fuzzer and the time taken. Fixes bitcoin#21461 Top commit has no ACKs. Tree-SHA512: d8d046d4a309652eb13de42116276bf992480bc887ad3535a8ff18b354cb24826bc562b06af63802ec945c637f046563b6a5601d6321b46a5543127daafea09b
f59bee3 fuzz: execute each file in dir without fuzz engine (Anthony Towns) Pull request description: Phony fuzzing (phuzzing)! Run the fuzz testing code against known inputs to detect errors. Advantage is you can easily test using the existing qa-assets datasets without having to compile with fuzzing enabled; disadvantage is that it doesn't do any actual fuzzing. Example usage: ``` $ for a in ${QA_ASSETS}/fuzz_seed_corpus/*; do echo ${a##*/}; done | xargs -P8 -I {} /bin/sh -c "FUZZ={} test/fuzz/fuzz ${QA_ASSETS}/fuzz_seed_corpus/{}" No fuzzer for address_deserialize. No fuzzer for addrdb. No fuzzer for banentry_deserialize. addition_overflow: succeeded against 848 files in 0s. asmap: succeeded against 981 files in 0s. checkqueue: succeeded against 211 files in 0s. ... ``` (`-P8` says run 8 of the tasks in parallel) If there are failures, the first one will be reported and the program will abort with output like: ``` fuzz: test/fuzz/versionbits.cpp:336: void (anonymous namespace)::versionbits_fuzz_target(FuzzBufferType): Assertion `exp_state != ThresholdState::FAILED' failed. Error processing seed "corpus/versionbits/35345ae8e722234095810b1117a29b63af7621af" ``` Rebase of bitcoin#22763, which was a rebase of bitcoin#21496, but also reports the name of the fuzzer and the time taken. Fixes bitcoin#21461 Top commit has no ACKs. Tree-SHA512: d8d046d4a309652eb13de42116276bf992480bc887ad3535a8ff18b354cb24826bc562b06af63802ec945c637f046563b6a5601d6321b46a5543127daafea09b
f59bee3 fuzz: execute each file in dir without fuzz engine (Anthony Towns) Pull request description: Phony fuzzing (phuzzing)! Run the fuzz testing code against known inputs to detect errors. Advantage is you can easily test using the existing qa-assets datasets without having to compile with fuzzing enabled; disadvantage is that it doesn't do any actual fuzzing. Example usage: ``` $ for a in ${QA_ASSETS}/fuzz_seed_corpus/*; do echo ${a##*/}; done | xargs -P8 -I {} /bin/sh -c "FUZZ={} test/fuzz/fuzz ${QA_ASSETS}/fuzz_seed_corpus/{}" No fuzzer for address_deserialize. No fuzzer for addrdb. No fuzzer for banentry_deserialize. addition_overflow: succeeded against 848 files in 0s. asmap: succeeded against 981 files in 0s. checkqueue: succeeded against 211 files in 0s. ... ``` (`-P8` says run 8 of the tasks in parallel) If there are failures, the first one will be reported and the program will abort with output like: ``` fuzz: test/fuzz/versionbits.cpp:336: void (anonymous namespace)::versionbits_fuzz_target(FuzzBufferType): Assertion `exp_state != ThresholdState::FAILED' failed. Error processing seed "corpus/versionbits/35345ae8e722234095810b1117a29b63af7621af" ``` Rebase of bitcoin#22763, which was a rebase of bitcoin#21496, but also reports the name of the fuzzer and the time taken. Fixes bitcoin#21461 Top commit has no ACKs. Tree-SHA512: d8d046d4a309652eb13de42116276bf992480bc887ad3535a8ff18b354cb24826bc562b06af63802ec945c637f046563b6a5601d6321b46a5543127daafea09b
Fixes #21461
This supports whole directory runs of the fuzzing seeds without needing a fuzzing library. A few design decisions that I went with but happy to change:
a. Can have 0 args and continue to feed via file input like before
b. Can have 1 or more args that is a file or directory
d. Subdirectories are ignored if you pass in a directory (though you can use
dir/*
to include files from subdirectories)__AFL_LOOP
path or if built withoutDPROVIDE_FUZZ_MAIN_FUNCTION
Testing instructions
To build without libFuzzer, exclude the sanitizers. IE:
That should have the
CPPFLAGS
of-DPROVIDE_FUZZ_MAIN_FUNCTION
to indicicate it was built with the main function to use instead of the libFuzzerLLVMFuzzerInitialize
&LLVMFuzzerTestOneInput
methods.Tests:
Happy to make any changes and very much welcome code feedback, my c++ a bit rusty and still figuring out the codebase.