Skip to content

Conversation

Crypt-iQ
Copy link
Contributor

@Crypt-iQ Crypt-iQ commented Sep 3, 2025

Posting up to get feedback, there are some design flaws with the approach in this PR. Coverage is here (look in src/blockencodings.cpp, relevant compact block bits in src/net_processing.cpp).

This harness can make (in)valid blocks, reconstruct blocks with in-mempool txns, mark peers as HB, and has high stability in AFL++ (~98-99%).

The main downside is that there are filesystem operations. In the .init function initialize_cmpctblock, a chain of 200 blocks is created. Each fuzzing iteration then copies this statically-named, "cached" data directory to a temporary directory that gets deleted at the end of the iteration. If each fuzzing iteration instead mines its own chain, the execs/s slows down to a crawl (~0.5/s or less, which would also make CI runs really slow).

This allows other fuzz tests to use the function to create valid block headers.
@DrahtBot DrahtBot added the Tests label Sep 3, 2025
@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 3, 2025

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/33300.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #33191 (net: Provide block templates to peers on request by ajtowns)
  • #29415 (Broadcast own transactions only via short-lived Tor or I2P connections by vasild)
  • #26812 (test: add end-to-end tests for CConnman and PeerManager by vasild)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@Crypt-iQ Crypt-iQ force-pushed the cmpctblock-fuzz-0807-fs branch 2 times, most recently from ea293e3 to c426b80 Compare September 3, 2025 22:45
@maflcko
Copy link
Member

maflcko commented Sep 4, 2025

There does not seem to be a way without signal handlers to delete the static, cached datadir when fuzzing is complete. Another issue is that multi-core fuzzing via AFL++ does not work because each worker will try to wipe the cached datadir (created via -testdatadir which wipes if the directory exists) which will cause other workers to crash if they are trying to copy it. I could not figure out a way for each worker to have their own cached datadir.

Is my understanding correct that AFL will run init (create the static dir), then fork into several fuzz processes, which run for N iterations and then shut down and delete the static dir? (On the next re-fork, it will fail to find the static dir)?

If yes, a solution could be to lazy-init the static dir after init, after the fork. Also, you may have to use the pid (or os-rand) to name the static folder to avoid collisions.

As for deleting the dirs, maybe it is possible to spin up several testing setups at the same time:

  • One dummy testing setup to create a root dir, which is cleaned up
  • One testing setup living inside that root dir for the static folder (in a subfolder)
  • One testing setup living inside the root dir with a copy of the static folder

(Not sure if this is possible, though)

@dergoegge
Copy link
Member

Is my understanding correct that AFL will run init (create the static dir), then fork into several fuzz processes, which run for N iterations and then shut down and delete the static dir? (On the next re-fork, it will fail to find the static dir)?

We removed the __AFL_INIT call, so it will actually fork prior to init, so I think all that's needed is to randomize the static dir name as you suggested as well.

@Crypt-iQ
Copy link
Contributor Author

Crypt-iQ commented Sep 4, 2025

Is my understanding correct that AFL will run init (create the static dir), then fork into several fuzz processes, which run for N iterations and then shut down and delete the static dir? (On the next re-fork, it will fail to find the static dir)?

My description was a bit vague. Since the harness uses -testdatadir in init to create the static datadir, it does not get deleted in the TestingSetup destructor (by design of -testdatadir). Any time init is called, it will wipe the static datadir if it already exists. This can happen after N iterations (100,000 currently) or with a newly spawned worker after forking since the fork point is prior to init like @dergoegge mentioned. If the datadir is deleted while a fuzz iteration tries to copy the non-existent directory, it will crash.

a solution could be to lazy-init the static dir after init, after the fork

I considered this, but I think this introduces some (slight) non-determinism as the directory may/may not exist?

Also, you may have to use the pid (or os-rand) to name the static folder to avoid collisions.

Yup, I agree. I think this randomization can only be done if the static datadir can be deleted at the end of fuzzing, otherwise there will be lots of these lying around?

As for deleting the dirs, maybe it is possible to spin up several testing setups at the same time:

Why is the dummy testing setup needed in this example? Could it instead just be two testing setups (one for the static datadir, one for the copied datadir)? I considered something similar to this as well where there is a static TestingSetup that gets destructed at the end of fuzzing (and wipes the static datadir) and one for each fuzz iteration (that gets destructed at the end of the iteration), but I was worried that the static TestingSetup would introduce more non-determinism? For example, what if this static TestingSetup has scheduling going on or is using the static datadir while we try to copy from it?

@Crypt-iQ
Copy link
Contributor Author

Crypt-iQ commented Sep 5, 2025

I was able to get AFL++ multi-core fuzzing to work by adding a static FuzzedDirectoryWrapper instance that deletes the passed path in its destructor as well as using a random element to each statically named datadir. Each instance is in the high 98-99% stability and does not crash after hitting 100,000 iterations (yay). I was not able to use multiple TestingSetups at the same time without causing several panics due to assertions about globals. I will fix the lint, tidy jobs and also see if the deterministic-fuzz-coverage issues still pop up.

Disable implicit string conversion.
This harness does one of the following operations in a loop:
- send a cmpctblock message to the test node
- send a blocktxn message to the test node
- send a headers message to the test node
- send a sendcmpct message to the test node
- send a tx message to the test node
- mine a block
- set mock time

The initialize function creates a TestingSetup and mines 200 blocks. Each fuzz
iteration will then create a TestingSetup and copy the cached data directory via
-fuzzcopydatadir.
@Crypt-iQ Crypt-iQ force-pushed the cmpctblock-fuzz-0807-fs branch from c76f475 to 0de9143 Compare September 9, 2025 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants