Skip to content

Conversation

darosior
Copy link
Member

We don't have a fuzzing harness for our main consensus engine [0]. This PR introduces two new targets which respectively fuzz the BlockManager and ChainstateManager (process headers, blocks, reorgs and assert some invariants in doing so).

There is two main obstacles to achieving this: PoW and io. The blocks and chainstate databases can be stored in memory but blocks still need a valid proof of work and to be stored on disk. Niklas solved the first issue in #28043 by simply introducing a global which makes it possible to mock the PoW check (his commit is cherry-picked here). After considering other approaches, i also used globals to mock disk io.

I'm interested with this PR in getting feedback on the concept and the approach, but also in suggestions of more invariants to be asserting in the chainstate fuzz target.

Regarding other approaches i tried the most potentially promising was to leverage ld's --wrap option to mock the syscalls without having to modify non-test code. But i didn't try too hard to make it work: better to have a demo of what can be achieved first with a more trivial way of mocking filesystem calls. If there is interest in these fuzz targets, i can give this approach another look.

Regarding efficiency, the chainstate fuzz target is quite slow at the moment but i've at least 2x its performance by rebasing on #28960 and making CheckBlockIndex callable externally even if !ShouldCheckBlockIndex(). Suggestions for performance improvements welcome too.


[0] Well there is utxo_total_supply but it's very specialized toward exercising a specific past bug.

@DrahtBot
Copy link
Contributor

DrahtBot commented Dec 30, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29158.

Reviews

See the guideline for information on the review process.

Type Reviewers
Concept ACK TheCharlatan, jamesob

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30664 (build: Remove Autotools-based build system by hebasto)
  • #30661 (fuzz: Test headers pre-sync through p2p by marcofleon)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jamesob
Copy link
Contributor

jamesob commented Jan 2, 2024

Cool, this is a great thing to investigate. I'll be giving the approach a look this week.

@dergoegge
Copy link
Member

Thanks for working on this!

One alternative that I have considered before (for chainstate fuzzing) is to abstract and further modularize BlockManager, which would allow us to have an InMemoryBlockManager for tests (especially useful for fuzzing but also nice in unit tests).

This would require a bunch of work:

  • Breaking up the friendship between BlockManager, Chainstate & ChainstateManager
  • Abstracting BlockManager's interface away from being file based
  • Hiding access to BlockManager's internal fields
  • Probably more...

This approach would avoid filesystem syscalls entirely, as well as the large block file allocations.


The coinbase maturity also seems relevant because you can't spend any coins in the test until you've mined 100 blocks. Mining 100 blocks every fuzz iteration ends up being pretty slow. Maybe we can use assumeutxo to avoid that? (or snapshot fuzzing)

@brunoerg
Copy link
Contributor

brunoerg commented Jan 3, 2024

Nice one!

@TheCharlatan
Copy link
Contributor

Concept ACK

Copy link
Contributor

@jamesob jamesob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK; midway through review and trying to resolve some of the CI issues.

FuzzedDataProvider fuzzed_data_provider{buffer.data(), buffer.size()};
const auto& chainparams{Params()};
const fs::path datadir{""};
std::unordered_map<COutPoint, CTxOut, SaltedOutpointHasher> utxos;
Copy link
Contributor

@jamesob jamesob Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type comes up often enough in this file that it might be worth an alias.

@jamesob
Copy link
Contributor

jamesob commented Jan 10, 2024

Pushed three additional commits to my branch that may make a dent in the CI issues:

@darosior
Copy link
Member Author

darosior commented Jul 8, 2024

Rebased this, taking advantage of #28960. I've also been investigating alternative approaches.

I first tried to move from fmemopen toward the more flexible memfd_create. It avoided the need for some of the filesystem mocks (which were necessary before because you can't call fileno on a FILE* created with fmemopen). This allowed to drop one commit. Further, this removed the need from creating 128MiB in-memory blk files. This is in turn makes it potentially more reasonable to sometimes reindex in CallOneOf.

With that implemented i tried to "cache" an initial chainstate to re-use on each fuzzing round, to avoid having to connect the 110 blocks initial chain on every single iteration. Unsuccessfully.

Finally, i wanted to compare the performances of mocking the filesystem to simply using a ramdisk. I realized if i were to use a ramdisk i could simply use the filesystem directly to "cache" an initial chainstate: create two datadirs, one for the initial chainstate, one for the fuzzing iteration. At initialization of the target create a fresh chainstate and connect the 110 blocks. Upon each iteration wipe the working datadir and copy over the initial datadir.

So i implemented that, which besides being more efficient also has the advantage of removing the modifications of non-test code and the platform-dependent syscalls. The target is still pretty slow, but at this point it's just because of the code it calls: we are doing a bunch of block/header connections and reorg upon every iteration, and each of those can take around a hundred milliseconds.

I've pushed a WIP commit which implements what i described above for the chainstate target. Do people think this effort is worth pursuing in this form? If so i'll clean up this PR to remove the filesystem mocking commits and also use a ramdisk for the blockstorage target.

@darosior darosior force-pushed the 2309_fuzz_chainstate branch from 1059ca3 to 040af0e Compare July 9, 2024 09:42
@darosior
Copy link
Member Author

darosior commented Jul 28, 2024

Alright so i cleaned up this PR locally to only use a RAM disk. I'm now in a middle of a significant refactoring of the chainstate fuzz target which i hope to be able to push shortly. If you intended to read through this PoC, maybe wait for the next push.

darosior added 2 commits July 31, 2024 17:06
Exercise (most of) the public interface of the BlockManager and assert
some invariants. Notably, try to mimick block arrival whether its header
was announced first or not.
@darosior darosior force-pushed the 2309_fuzz_chainstate branch from 040af0e to e92c9dd Compare July 31, 2024 15:20
@darosior
Copy link
Member Author

Cleaned up this PR to always use a ramdisk instead of trying to mock the filesystem. Also significantly reworked the chainstate target.

Comment on lines +189 to +203
//! To generate a random tmp datadir per process (necessary to fuzz with multiple cores).
static FastRandomContext g_insecure_rand_ctx_temp_path;

struct TestData {
fs::path m_tmp_dir;
fs::path m_datadir;
fs::path m_init_datadir;
const CChainParams m_chain_params{*CChainParams::RegTest({})};
KernelNotifications m_notifs;
util::SignalInterrupt m_interrupt;

void Init() {
SeedRandomForTest(SeedRand::SEED);
const auto rand_str{g_insecure_rand_ctx_temp_path.rand256().ToString()};
m_tmp_dir = fs::temp_directory_path() / "fuzz_chainstate_" PACKAGE_NAME / rand_str;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just inline it, if it is only used once? Also, the build system overhead seems not worth it to place PACKAGE_NAME here? (See lint failure)

Suggested change
//! To generate a random tmp datadir per process (necessary to fuzz with multiple cores).
static FastRandomContext g_insecure_rand_ctx_temp_path;
struct TestData {
fs::path m_tmp_dir;
fs::path m_datadir;
fs::path m_init_datadir;
const CChainParams m_chain_params{*CChainParams::RegTest({})};
KernelNotifications m_notifs;
util::SignalInterrupt m_interrupt;
void Init() {
SeedRandomForTest(SeedRand::SEED);
const auto rand_str{g_insecure_rand_ctx_temp_path.rand256().ToString()};
m_tmp_dir = fs::temp_directory_path() / "fuzz_chainstate_" PACKAGE_NAME / rand_str;
struct TestData {
fs::path m_tmp_dir;
fs::path m_datadir;
fs::path m_init_datadir;
const CChainParams m_chain_params{*CChainParams::RegTest({})};
KernelNotifications m_notifs;
util::SignalInterrupt m_interrupt;
void Init() {
SeedRandomForTest(SeedRand::SEED);
const auto rand_str{FastRandomContext{}.rand256().ToString()}; //! To generate a random tmp datadir per process (necessary to fuzz with multiple cores).
m_tmp_dir = fs::temp_directory_path() / "fuzz_chainstate" / rand_str;

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 2, 2024

🐙 This pull request conflicts with the target branch and needs rebase.

@DrahtBot
Copy link
Contributor

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

  • Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
  • Is it no longer relevant? ➡️ Please close.
  • Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

@maflcko
Copy link
Member

maflcko commented Dec 2, 2024

cc @darosior ?

@TheCharlatan
Copy link
Contributor

One alternative that I have considered before (for chainstate fuzzing) is to abstract and further modularize BlockManager, which would allow us to have an InMemoryBlockManager for tests (especially useful for fuzzing but also nice in unit tests).

Forgot to post this here at the time, but I was working on an abstract block store for a different context some months ago: TheCharlatan@5f35e50. I'm not sure how useful this actually is though, since mounting a ramdisk just is very easy, but maybe it could be useful to verify that certain functions, e.g. allocate, are called at the expected moment.

@maflcko
Copy link
Member

maflcko commented Dec 2, 2024

since mounting a ramdisk just is very easy

It is indeed easy, but for some reason I found mixed results when doing it by default in the CI: #31182 . I couldn't find a real improvement there, except for the utxo_total_supply fuzz target. (See also bitcoin-core/qa-assets#158 (comment))

So forcing the blockstore into ram for this target (and possibly others) could still be useful.

@darosior
Copy link
Member Author

darosior commented Dec 2, 2024

This has unfortunately gotten lower in my list of priorities. I'll close this PR for now to clarify its status. I hope to get back to this at some point. Anyone willing to work on this, or parts of this feel free to grab or/and reach out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants