PoC: fuzz chainstate and block managers #29158

darosior · 2023-12-30T10:05:20Z

We don't have a fuzzing harness for our main consensus engine [0]. This PR introduces two new targets which respectively fuzz the BlockManager and ChainstateManager (process headers, blocks, reorgs and assert some invariants in doing so).

There is two main obstacles to achieving this: PoW and io. The blocks and chainstate databases can be stored in memory but blocks still need a valid proof of work and to be stored on disk. Niklas solved the first issue in #28043 by simply introducing a global which makes it possible to mock the PoW check (his commit is cherry-picked here). After considering other approaches, i also used globals to mock disk io.

I'm interested with this PR in getting feedback on the concept and the approach, but also in suggestions of more invariants to be asserting in the chainstate fuzz target.

Regarding other approaches i tried the most potentially promising was to leverage ld's --wrap option to mock the syscalls without having to modify non-test code. But i didn't try too hard to make it work: better to have a demo of what can be achieved first with a more trivial way of mocking filesystem calls. If there is interest in these fuzz targets, i can give this approach another look.

Regarding efficiency, the chainstate fuzz target is quite slow at the moment but i've at least 2x its performance by rebasing on #28960 and making CheckBlockIndex callable externally even if !ShouldCheckBlockIndex(). Suggestions for performance improvements welcome too.

[0] Well there is utxo_total_supply but it's very specialized toward exercising a specific past bug.

DrahtBot · 2023-12-30T10:05:22Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29158.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	TheCharlatan, jamesob

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#30664 (build: Remove Autotools-based build system by hebasto)
#30661 (fuzz: Test headers pre-sync through p2p by marcofleon)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

src/test/fuzz/chainstate.cpp

jamesob · 2024-01-02T15:46:43Z

Cool, this is a great thing to investigate. I'll be giving the approach a look this week.

dergoegge · 2024-01-02T17:22:14Z

Thanks for working on this!

One alternative that I have considered before (for chainstate fuzzing) is to abstract and further modularize BlockManager, which would allow us to have an InMemoryBlockManager for tests (especially useful for fuzzing but also nice in unit tests).

This would require a bunch of work:

Breaking up the friendship between BlockManager, Chainstate & ChainstateManager
Abstracting BlockManager's interface away from being file based
Hiding access to BlockManager's internal fields
Probably more...

This approach would avoid filesystem syscalls entirely, as well as the large block file allocations.

The coinbase maturity also seems relevant because you can't spend any coins in the test until you've mined 100 blocks. Mining 100 blocks every fuzz iteration ends up being pretty slow. Maybe we can use assumeutxo to avoid that? (or snapshot fuzzing)

brunoerg · 2024-01-03T17:21:25Z

Nice one!

TheCharlatan · 2024-01-03T17:54:38Z

Concept ACK

jamesob

Concept ACK; midway through review and trying to resolve some of the CI issues.

src/test/fuzz/chainstate.cpp

jamesob · 2024-01-09T21:05:16Z

src/test/fuzz/chainstate.cpp

+    FuzzedDataProvider fuzzed_data_provider{buffer.data(), buffer.size()};
+    const auto& chainparams{Params()};
+    const fs::path datadir{""};
+    std::unordered_map<COutPoint, CTxOut, SaltedOutpointHasher> utxos;


This type comes up often enough in this file that it might be worth an alias.

src/test/fuzz/chainstate.cpp

jamesob · 2024-01-10T17:38:48Z

Pushed three additional commits to my branch that may make a dent in the CI issues:

make fs::path hashable: jamesob@8f5fdf8
avoid use of std::filesystem::path where possible: jamesob@4ef7857
update linter for mockable filesystem ops: jamesob@face876

darosior · 2024-07-08T17:04:07Z

Rebased this, taking advantage of #28960. I've also been investigating alternative approaches.

I first tried to move from fmemopen toward the more flexible memfd_create. It avoided the need for some of the filesystem mocks (which were necessary before because you can't call fileno on a FILE* created with fmemopen). This allowed to drop one commit. Further, this removed the need from creating 128MiB in-memory blk files. This is in turn makes it potentially more reasonable to sometimes reindex in CallOneOf.

With that implemented i tried to "cache" an initial chainstate to re-use on each fuzzing round, to avoid having to connect the 110 blocks initial chain on every single iteration. Unsuccessfully.

Finally, i wanted to compare the performances of mocking the filesystem to simply using a ramdisk. I realized if i were to use a ramdisk i could simply use the filesystem directly to "cache" an initial chainstate: create two datadirs, one for the initial chainstate, one for the fuzzing iteration. At initialization of the target create a fresh chainstate and connect the 110 blocks. Upon each iteration wipe the working datadir and copy over the initial datadir.

So i implemented that, which besides being more efficient also has the advantage of removing the modifications of non-test code and the platform-dependent syscalls. The target is still pretty slow, but at this point it's just because of the code it calls: we are doing a bunch of block/header connections and reorg upon every iteration, and each of those can take around a hundred milliseconds.

I've pushed a WIP commit which implements what i described above for the chainstate target. Do people think this effort is worth pursuing in this form? If so i'll clean up this PR to remove the filesystem mocking commits and also use a ramdisk for the blockstorage target.

src/test/fuzz/chainstate.cpp

darosior · 2024-07-28T21:15:30Z

Alright so i cleaned up this PR locally to only use a RAM disk. I'm now in a middle of a significant refactoring of the chainstate fuzz target which i hope to be able to push shortly. If you intended to read through this PoC, maybe wait for the next push.

Exercise (most of) the public interface of the BlockManager and assert some invariants. Notably, try to mimick block arrival whether its header was announced first or not.

darosior · 2024-07-31T15:29:12Z

Cleaned up this PR to always use a ramdisk instead of trying to mock the filesystem. Also significantly reworked the chainstate target.

maflcko · 2024-08-01T08:43:35Z

src/test/fuzz/chainstate.cpp

+//! To generate a random tmp datadir per process (necessary to fuzz with multiple cores).
+static FastRandomContext g_insecure_rand_ctx_temp_path;
+
+struct TestData {
+    fs::path m_tmp_dir;
+    fs::path m_datadir;
+    fs::path m_init_datadir;
+    const CChainParams m_chain_params{*CChainParams::RegTest({})};
+    KernelNotifications m_notifs;
+    util::SignalInterrupt m_interrupt;
+
+    void Init() {
+        SeedRandomForTest(SeedRand::SEED);
+        const auto rand_str{g_insecure_rand_ctx_temp_path.rand256().ToString()};
+        m_tmp_dir = fs::temp_directory_path() / "fuzz_chainstate_" PACKAGE_NAME / rand_str;


Maybe just inline it, if it is only used once? Also, the build system overhead seems not worth it to place PACKAGE_NAME here? (See lint failure)

Suggested change

//! To generate a random tmp datadir per process (necessary to fuzz with multiple cores).

static FastRandomContext g_insecure_rand_ctx_temp_path;

struct TestData {

fs::path m_tmp_dir;

fs::path m_datadir;

fs::path m_init_datadir;

const CChainParams m_chain_params{*CChainParams::RegTest({})};

KernelNotifications m_notifs;

util::SignalInterrupt m_interrupt;

void Init() {

SeedRandomForTest(SeedRand::SEED);

const auto rand_str{g_insecure_rand_ctx_temp_path.rand256().ToString()};

m_tmp_dir = fs::temp_directory_path() / "fuzz_chainstate_" PACKAGE_NAME / rand_str;

struct TestData {

fs::path m_tmp_dir;

fs::path m_datadir;

fs::path m_init_datadir;

const CChainParams m_chain_params{*CChainParams::RegTest({})};

KernelNotifications m_notifs;

util::SignalInterrupt m_interrupt;

void Init() {

SeedRandomForTest(SeedRand::SEED);

const auto rand_str{FastRandomContext{}.rand256().ToString()}; //! To generate a random tmp datadir per process (necessary to fuzz with multiple cores).

m_tmp_dir = fs::temp_directory_path() / "fuzz_chainstate" / rand_str;

DrahtBot · 2024-09-02T22:45:35Z

🐙 This pull request conflicts with the target branch and needs rebase.

DrahtBot · 2024-11-30T01:11:00Z

⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?

Is it still relevant? ➡️ Please solve the conflicts to make it ready for review and to ensure the CI passes.
Is it no longer relevant? ➡️ Please close.
Did the author lose interest or time to work on this? ➡️ Please close it and mark it 'Up for grabs' with the label, so that it can be picked up in the future.

maflcko · 2024-12-02T09:19:28Z

cc @darosior ?

TheCharlatan · 2024-12-02T09:39:39Z

One alternative that I have considered before (for chainstate fuzzing) is to abstract and further modularize BlockManager, which would allow us to have an InMemoryBlockManager for tests (especially useful for fuzzing but also nice in unit tests).

Forgot to post this here at the time, but I was working on an abstract block store for a different context some months ago: TheCharlatan@5f35e50. I'm not sure how useful this actually is though, since mounting a ramdisk just is very easy, but maybe it could be useful to verify that certain functions, e.g. allocate, are called at the expected moment.

maflcko · 2024-12-02T09:45:24Z

since mounting a ramdisk just is very easy

It is indeed easy, but for some reason I found mixed results when doing it by default in the CI: #31182 . I couldn't find a real improvement there, except for the utxo_total_supply fuzz target. (See also bitcoin-core/qa-assets#158 (comment))

So forcing the blockstore into ram for this target (and possibly others) could still be useful.

darosior · 2024-12-02T15:33:51Z

This has unfortunately gotten lower in my list of priorities. I'll close this PR for now to clarify its status. I hope to get back to this at some point. Anyone willing to work on this, or parts of this feel free to grab or/and reach out.

DrahtBot added the CI failed label Dec 30, 2023

dergoegge reviewed Jan 2, 2024

View reviewed changes

src/test/fuzz/chainstate.cpp Outdated Show resolved Hide resolved

jamesob reviewed Jan 10, 2024

View reviewed changes

DrahtBot mentioned this pull request Jan 11, 2024

log: Nuke error(...) #29236

Merged

jamesob mentioned this pull request Jan 12, 2024

kernel: Remove dependency on CScheduler #28960

Merged

DrahtBot mentioned this pull request Jan 24, 2024

util: explicitly close all AutoFiles that have been written #29307

Merged

DrahtBot added the Needs rebase label Mar 12, 2024

darosior force-pushed the 2309_fuzz_chainstate branch from ea36af8 to 1059ca3 Compare July 8, 2024 17:00

DrahtBot removed the Needs rebase label Jul 8, 2024

DrahtBot mentioned this pull request Jul 9, 2024

logging: Replace LogError and LogWarning with LogAlert #30364

Closed

dergoegge reviewed Jul 9, 2024

View reviewed changes

src/test/fuzz/chainstate.cpp Outdated Show resolved Hide resolved

darosior force-pushed the 2309_fuzz_chainstate branch from 1059ca3 to 040af0e Compare July 9, 2024 09:42

dergoegge reviewed Jul 9, 2024

View reviewed changes

src/test/fuzz/chainstate.cpp Outdated Show resolved Hide resolved

src/test/fuzz/chainstate.cpp Outdated Show resolved Hide resolved

DrahtBot mentioned this pull request Jul 12, 2024

Remove the legacy wallet and BDB dependency #28710

Merged

Allow mocking CheckProofOfWork

c94673a

darosior added 2 commits July 31, 2024 17:06

fuzz: add a target for the BlockManager

a724d1d

Exercise (most of) the public interface of the BlockManager and assert some invariants. Notably, try to mimick block arrival whether its header was announced first or not.

fuzz: add a target for the ChainstateManager

e92c9dd

darosior force-pushed the 2309_fuzz_chainstate branch from 040af0e to e92c9dd Compare July 31, 2024 15:20

maflcko reviewed Aug 1, 2024

View reviewed changes

darosior mentioned this pull request Aug 7, 2024

fuzz: a target for the block index database #28209

Merged

DrahtBot mentioned this pull request Aug 15, 2024

fuzz: Test headers pre-sync through p2p #30661

Merged

DrahtBot mentioned this pull request Aug 16, 2024

build: Remove Autotools-based build system #30664

Merged

hebasto added the Needs CMake port label Aug 16, 2024

maflcko removed the Needs CMake port label Aug 29, 2024

DrahtBot added the Needs rebase label Sep 2, 2024

darosior closed this Dec 2, 2024

mzumsande mentioned this pull request Dec 18, 2024

fuzz: Add fuzz target for block index tree and related validation events #31533

Draft

PoC: fuzz chainstate and block managers #29158

PoC: fuzz chainstate and block managers #29158

Uh oh!

Conversation

darosior commented Dec 30, 2023

Uh oh!

DrahtBot commented Dec 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

Uh oh!

jamesob commented Jan 2, 2024

Uh oh!

dergoegge commented Jan 2, 2024

Uh oh!

brunoerg commented Jan 3, 2024

Uh oh!

TheCharlatan commented Jan 3, 2024

Uh oh!

jamesob left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jamesob Jan 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jamesob commented Jan 10, 2024

Uh oh!

darosior commented Jul 8, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

darosior commented Jul 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

darosior commented Jul 31, 2024

Uh oh!

maflcko Aug 1, 2024

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Sep 2, 2024

Uh oh!

DrahtBot commented Nov 30, 2024

Uh oh!

maflcko commented Dec 2, 2024

Uh oh!

TheCharlatan commented Dec 2, 2024

Uh oh!

maflcko commented Dec 2, 2024

Uh oh!

darosior commented Dec 2, 2024

Uh oh!

Uh oh!

DrahtBot commented Dec 30, 2023 •

edited

Loading

jamesob Jan 9, 2024 •

edited

Loading

darosior commented Jul 28, 2024 •

edited

Loading