Skip to content

Conversation

ghost
Copy link

@ghost ghost commented Mar 21, 2021

Fixes #21461

This supports whole directory runs of the fuzzing seeds without needing a fuzzing library. A few design decisions that I went with but happy to change:

  1. Very basic multi arg support
    a. Can have 0 args and continue to feed via file input like before
    b. Can have 1 or more args that is a file or directory
    d. Subdirectories are ignored if you pass in a directory (though you can use dir/* to include files from subdirectories)
  2. Based the file reading off of util/readwritefile.cpp with small changes to make it a buffer instead of a string (could reuse that but turn string to buffer?)
  3. The new changes are ignored in the __AFL_LOOP path or if built without DPROVIDE_FUZZ_MAIN_FUNCTION
Testing instructions

To build without libFuzzer, exclude the sanitizers. IE:

CC=clang CXX=clang++ ./configure BDB_LIBS="-L${BDB_PREFIX}/lib -ldb_cxx-4.8" BDB_CFLAGS="-I${BDB_PREFIX}/include" --without-gui --with-zmq --enable-fuzz

That should have the CPPFLAGS of -DPROVIDE_FUZZ_MAIN_FUNCTION to indicicate it was built with the main function to use instead of the libFuzzer LLVMFuzzerInitialize & LLVMFuzzerTestOneInput methods.

Tests:

# clean and build
make clean
make -j "$(($(nproc)+1))"

# get qa-assets if you don't have already
git clone https://github.com/bitcoin-core/qa-assets

# existing way to feed 1 at a time, still supported
FUZZ=process_message src/test/fuzz/fuzz < qa-assets/fuzz_seed_corpus/process_message/1258dd51f2a5f3221b33a306279ef7290c5fca6d

# new with this PR: one at a time
FUZZ=process_message src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/process_message/1258dd51f2a5f3221b33a306279ef7290c5fca6d

# or multiple files at the same time
FUZZ=process_message src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/process_message/1258dd51f2a5f3221b33a306279ef7290c5fca6d qa-assets/fuzz_seed_corpus/process_message/32230c42df1caa3220a3f95ee8a8d2865731ca3e qa-assets/fuzz_seed_corpus/process_message/90329d105d238ed81452077b4e0287a2040db9e9

# new with this PR: whole directory at a time
FUZZ=process_message src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/process_message

# or mix of files and directories at the same time
FUZZ=process_message src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/process_message/1258dd51f2a5f3221b33a306279ef7290c5fca6d qa-assets/fuzz_seed_corpus/process_message/32230c42df1caa3220a3f95ee8a8d2865731ca3e qa-assets/fuzz_seed_corpus/process_message/90329d105d238ed81452077b4e0287a2040db9e9 qa-assets/fuzz_seed_corpus/process_message/

# new with this PR: wildcard support
FUZZ=process_messages src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/process_messages/* 

# new with this PR: run all seeds in all targets, one target/directory at a time 
for D in qa-assets/fuzz_seed_corpus/*; do [ -d "${D}" ] && echo "${D##*/}" && FUZZ="${D##*/}" src/test/fuzz/fuzz qa-assets/fuzz_seed_corpus/"${D##*/}"; done

Happy to make any changes and very much welcome code feedback, my c++ a bit rusty and still figuring out the codebase.

@ghost
Copy link
Author

ghost commented Mar 22, 2021

Oh, didn't realize that filesystem isn't available right now, was just going by the issue suggestion when I implemented it here. Saw this issue: #20744

I guess the alternative is to either wait for the switch or to go with the boost approach for iterating through files in a directory?

@ajtowns
Copy link
Contributor

ajtowns commented Mar 22, 2021

Concept ACK. Not sure if there's much point to passing in a single file (as opposed to < single_file). If a file can't be read, maybe continue to try the rest of them, or assert(false) to indicate an error rather than return 0 though?

@ajtowns
Copy link
Contributor

ajtowns commented Mar 22, 2021

Might be good to update test/fuzz/test_runner.py to work with these changes -- it's got a couple of hardcoded things that expect libFuzzer/ENABLE_FUZZ and a few things that won't work without fuzzing (m_dir/merge_inputs), and adds a -runs=1 arg that's not needed without fuzzing, but aside from that works fine with phuzzing (phony fuzzing (tm)).

@fanquake
Copy link
Member

Oh, didn't realize that filesystem isn't available right now,

It wont be available for general use in any bitcoind code for a while yet. @MarcoFalke is your intention to start using <filesystem> just in the fuzzing related code? We might have to cherry-pick a change out of #20744.

@maflcko
Copy link
Member

maflcko commented Mar 22, 2021

This will be build in the regular "non-fuzz-engine" target, so it can't yet use std::fs (my bad).

@practicalswift
Copy link
Contributor

Concept ACK

It would be nice if multiple files or directories could be specified in order to mimic the libFuzzer command-line interface.

I'm thinking something like this:

$ FUZZ=harness src/test/fuzz/fuzz
# Would use test case provided via stdin.

$ FUZZ=harness src/test/fuzz/fuzz dir1
# Would use test cases from `dir1/` assuming `dir1` is a directory.

$ FUZZ=harness src/test/fuzz/fuzz file1
# Would use test case `file1` assuming `file1` is a file.

$ FUZZ=harness src/test/fuzz/fuzz dir1 dir2 file1
# Would use test cases from `dir1/`, `dir2/` and also the test case `file1`.

This would allow for picking inputs via shell expanded wildcards.

@ghost
Copy link
Author

ghost commented Mar 22, 2021

Great feedback so far, thanks all!

I'll swap out filesystem for boost, handle consecutive dir/file args, look into whats going on with test/fuzz/test_runner.py, and some of the other points brought up.

@maflcko
Copy link
Member

maflcko commented Mar 28, 2021

I think the test_runner changes can be done in a follow-up, since they are not required to add the feature. Also, they will raise the question how test_runner should deal with different fuzz engines in general.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 28, 2021

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ghost
Copy link
Author

ghost commented Apr 5, 2021

Sorry about the delay on this. Updated the code and the PR description to include boost, multiple arguments that can be either files or directories (including wildcard support), as well as some asserts in the file readings. Ignored test_runner for now but can look into it on a follow up.

Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work fine for me; some comments below.

return false;
do {
const size_t length = fread(buffer, sizeof(uint8_t), sizeof(buffer), f);
Assert(!ferror(f));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (ferror(f)) return false; ? (Same behaviour as fopen failure)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, the assert is done in the parent method. Changed.

if (fs::is_directory(seed_path)) {
for (fs::directory_iterator it(seed_path); it != fs::directory_iterator(); ++it) {
if (!fs::is_regular_file(it->path())) continue;
Assert(read_file(it->path(), buffer));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!is_regular_file(..)) { std::cerr << "Not a file: " << it->path() << endl; abort(); } ?
if (!read_file(..)) { std::cerr << "Error reading file: " << it->path() << endl; abort(); } ?

Seems like it would be helpful to know what file is causing a problem. Not sure that skipping non-regular files rather than erroring on them makes sense?

Copy link
Contributor

@ajtowns ajtowns Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, would be helpful to know what file is causing a problem if the fuzz testing fails too. One way to do that would be to use a signal handler to catch SIGABRT, and pass the filename via a global, something like:

#if defined(PROVIDE_FUZZ_MAIN_FUNCTION) && !defined(__AFL_LOOP)
fs::path g_seed_path;
void signal_handler(int signal) 
{
    if (signal == SIGABRT) {
        std::cerr << "Error processing seed " << g_seed_path << std::endl;
    } else {
        std::cerr << "Unexpected signal " << signal << " received\n";
    }
    std::_Exit(EXIT_FAILURE);
}
#endif
...
    auto previous_handler = std::signal(SIGABRT, signal_handler);
...
    g_seed_path = seed_path;
    Assert(read_file(seed_path, buffer));
    test_one_input(buffer);

(EDIT: maybe these are better for a followup PR, if thinking about this stuff would delaying this one too much)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion, otherwise how is one to know if there's an issue. Added the suggested code!

Assert(read_file(it->path(), buffer));
test_one_input(buffer);
buffer.clear();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth reporting the number of tests run when all the inputs succeed? It's a bit unclear that anything happened otherwise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a counter

{
uint8_t buffer[1024];
FILE* f = fsbridge::fopen(p.string(), "rb");
if (f == nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (f == nullptr)
if (f == nullptr) {

According to style-guide

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I actually went with

if (f == nullptr) return false;

if that's alright instead, to be consistent with one liners.

@DrahtBot
Copy link
Contributor

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".

@adamjonas
Copy link
Member

@AnthonyRonning friendly ping to rebase.

@maflcko
Copy link
Member

maflcko commented Aug 18, 2021

Marked "up for grabs" for now.

maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Mar 17, 2022
f59bee3 fuzz: execute each file in dir without fuzz engine (Anthony Towns)

Pull request description:

  Phony fuzzing (phuzzing)! Run the fuzz testing code against known inputs to detect errors. Advantage is you can easily test using the existing qa-assets datasets without having to compile with fuzzing enabled; disadvantage is that it doesn't do any actual fuzzing.

  Example usage:

  ```
  $ for a in ${QA_ASSETS}/fuzz_seed_corpus/*; do echo ${a##*/}; done | xargs -P8 -I {} /bin/sh -c "FUZZ={} test/fuzz/fuzz ${QA_ASSETS}/fuzz_seed_corpus/{}"
  No fuzzer for address_deserialize.
  No fuzzer for addrdb.
  No fuzzer for banentry_deserialize.
  addition_overflow: succeeded against 848 files in 0s.
  asmap: succeeded against 981 files in 0s.
  checkqueue: succeeded against 211 files in 0s.
  ...
  ```

  (`-P8` says run 8 of the tasks in parallel)

  If there are failures, the first one will be reported and the program will abort with output like:

  ```
  fuzz: test/fuzz/versionbits.cpp:336: void (anonymous namespace)::versionbits_fuzz_target(FuzzBufferType): Assertion `exp_state != ThresholdState::FAILED' failed.
  Error processing seed "corpus/versionbits/35345ae8e722234095810b1117a29b63af7621af"
  ```

  Rebase of bitcoin#22763, which was a rebase of bitcoin#21496, but also reports the name of the fuzzer and the time taken.

  Fixes bitcoin#21461

Top commit has no ACKs.

Tree-SHA512: d8d046d4a309652eb13de42116276bf992480bc887ad3535a8ff18b354cb24826bc562b06af63802ec945c637f046563b6a5601d6321b46a5543127daafea09b
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Mar 18, 2022
f59bee3 fuzz: execute each file in dir without fuzz engine (Anthony Towns)

Pull request description:

  Phony fuzzing (phuzzing)! Run the fuzz testing code against known inputs to detect errors. Advantage is you can easily test using the existing qa-assets datasets without having to compile with fuzzing enabled; disadvantage is that it doesn't do any actual fuzzing.

  Example usage:

  ```
  $ for a in ${QA_ASSETS}/fuzz_seed_corpus/*; do echo ${a##*/}; done | xargs -P8 -I {} /bin/sh -c "FUZZ={} test/fuzz/fuzz ${QA_ASSETS}/fuzz_seed_corpus/{}"
  No fuzzer for address_deserialize.
  No fuzzer for addrdb.
  No fuzzer for banentry_deserialize.
  addition_overflow: succeeded against 848 files in 0s.
  asmap: succeeded against 981 files in 0s.
  checkqueue: succeeded against 211 files in 0s.
  ...
  ```

  (`-P8` says run 8 of the tasks in parallel)

  If there are failures, the first one will be reported and the program will abort with output like:

  ```
  fuzz: test/fuzz/versionbits.cpp:336: void (anonymous namespace)::versionbits_fuzz_target(FuzzBufferType): Assertion `exp_state != ThresholdState::FAILED' failed.
  Error processing seed "corpus/versionbits/35345ae8e722234095810b1117a29b63af7621af"
  ```

  Rebase of bitcoin#22763, which was a rebase of bitcoin#21496, but also reports the name of the fuzzer and the time taken.

  Fixes bitcoin#21461

Top commit has no ACKs.

Tree-SHA512: d8d046d4a309652eb13de42116276bf992480bc887ad3535a8ff18b354cb24826bc562b06af63802ec945c637f046563b6a5601d6321b46a5543127daafea09b
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 5, 2024
f59bee3 fuzz: execute each file in dir without fuzz engine (Anthony Towns)

Pull request description:

  Phony fuzzing (phuzzing)! Run the fuzz testing code against known inputs to detect errors. Advantage is you can easily test using the existing qa-assets datasets without having to compile with fuzzing enabled; disadvantage is that it doesn't do any actual fuzzing.

  Example usage:

  ```
  $ for a in ${QA_ASSETS}/fuzz_seed_corpus/*; do echo ${a##*/}; done | xargs -P8 -I {} /bin/sh -c "FUZZ={} test/fuzz/fuzz ${QA_ASSETS}/fuzz_seed_corpus/{}"
  No fuzzer for address_deserialize.
  No fuzzer for addrdb.
  No fuzzer for banentry_deserialize.
  addition_overflow: succeeded against 848 files in 0s.
  asmap: succeeded against 981 files in 0s.
  checkqueue: succeeded against 211 files in 0s.
  ...
  ```

  (`-P8` says run 8 of the tasks in parallel)

  If there are failures, the first one will be reported and the program will abort with output like:

  ```
  fuzz: test/fuzz/versionbits.cpp:336: void (anonymous namespace)::versionbits_fuzz_target(FuzzBufferType): Assertion `exp_state != ThresholdState::FAILED' failed.
  Error processing seed "corpus/versionbits/35345ae8e722234095810b1117a29b63af7621af"
  ```

  Rebase of bitcoin#22763, which was a rebase of bitcoin#21496, but also reports the name of the fuzzer and the time taken.

  Fixes bitcoin#21461

Top commit has no ACKs.

Tree-SHA512: d8d046d4a309652eb13de42116276bf992480bc887ad3535a8ff18b354cb24826bc562b06af63802ec945c637f046563b6a5601d6321b46a5543127daafea09b
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 7, 2024
f59bee3 fuzz: execute each file in dir without fuzz engine (Anthony Towns)

Pull request description:

  Phony fuzzing (phuzzing)! Run the fuzz testing code against known inputs to detect errors. Advantage is you can easily test using the existing qa-assets datasets without having to compile with fuzzing enabled; disadvantage is that it doesn't do any actual fuzzing.

  Example usage:

  ```
  $ for a in ${QA_ASSETS}/fuzz_seed_corpus/*; do echo ${a##*/}; done | xargs -P8 -I {} /bin/sh -c "FUZZ={} test/fuzz/fuzz ${QA_ASSETS}/fuzz_seed_corpus/{}"
  No fuzzer for address_deserialize.
  No fuzzer for addrdb.
  No fuzzer for banentry_deserialize.
  addition_overflow: succeeded against 848 files in 0s.
  asmap: succeeded against 981 files in 0s.
  checkqueue: succeeded against 211 files in 0s.
  ...
  ```

  (`-P8` says run 8 of the tasks in parallel)

  If there are failures, the first one will be reported and the program will abort with output like:

  ```
  fuzz: test/fuzz/versionbits.cpp:336: void (anonymous namespace)::versionbits_fuzz_target(FuzzBufferType): Assertion `exp_state != ThresholdState::FAILED' failed.
  Error processing seed "corpus/versionbits/35345ae8e722234095810b1117a29b63af7621af"
  ```

  Rebase of bitcoin#22763, which was a rebase of bitcoin#21496, but also reports the name of the fuzzer and the time taken.

  Fixes bitcoin#21461

Top commit has no ACKs.

Tree-SHA512: d8d046d4a309652eb13de42116276bf992480bc887ad3535a8ff18b354cb24826bc562b06af63802ec945c637f046563b6a5601d6321b46a5543127daafea09b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fuzz: Make it possible to execute each file in a directory without fuzz engine
6 participants