-
Notifications
You must be signed in to change notification settings - Fork 37.7k
PoC: fuzz chainstate and block managers #29158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/29158. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Cool, this is a great thing to investigate. I'll be giving the approach a look this week. |
Thanks for working on this! One alternative that I have considered before (for chainstate fuzzing) is to abstract and further modularize This would require a bunch of work:
This approach would avoid filesystem syscalls entirely, as well as the large block file allocations. The coinbase maturity also seems relevant because you can't spend any coins in the test until you've mined 100 blocks. Mining 100 blocks every fuzz iteration ends up being pretty slow. Maybe we can use assumeutxo to avoid that? (or snapshot fuzzing) |
Nice one! |
Concept ACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK; midway through review and trying to resolve some of the CI issues.
src/test/fuzz/chainstate.cpp
Outdated
FuzzedDataProvider fuzzed_data_provider{buffer.data(), buffer.size()}; | ||
const auto& chainparams{Params()}; | ||
const fs::path datadir{""}; | ||
std::unordered_map<COutPoint, CTxOut, SaltedOutpointHasher> utxos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type comes up often enough in this file that it might be worth an alias.
Pushed three additional commits to my branch that may make a dent in the CI issues:
|
ea36af8
to
1059ca3
Compare
Rebased this, taking advantage of #28960. I've also been investigating alternative approaches. I first tried to move from With that implemented i tried to "cache" an initial chainstate to re-use on each fuzzing round, to avoid having to connect the 110 blocks initial chain on every single iteration. Unsuccessfully. Finally, i wanted to compare the performances of mocking the filesystem to simply using a ramdisk. I realized if i were to use a ramdisk i could simply use the filesystem directly to "cache" an initial chainstate: create two datadirs, one for the initial chainstate, one for the fuzzing iteration. At initialization of the target create a fresh chainstate and connect the 110 blocks. Upon each iteration wipe the working datadir and copy over the initial datadir. So i implemented that, which besides being more efficient also has the advantage of removing the modifications of non-test code and the platform-dependent syscalls. The target is still pretty slow, but at this point it's just because of the code it calls: we are doing a bunch of block/header connections and reorg upon every iteration, and each of those can take around a hundred milliseconds. I've pushed a WIP commit which implements what i described above for the |
1059ca3
to
040af0e
Compare
Alright so i cleaned up this PR locally to only use a RAM disk. I'm now in a middle of a significant refactoring of the chainstate fuzz target which i hope to be able to push shortly. If you intended to read through this PoC, maybe wait for the next push. |
Exercise (most of) the public interface of the BlockManager and assert some invariants. Notably, try to mimick block arrival whether its header was announced first or not.
040af0e
to
e92c9dd
Compare
Cleaned up this PR to always use a ramdisk instead of trying to mock the filesystem. Also significantly reworked the chainstate target. |
//! To generate a random tmp datadir per process (necessary to fuzz with multiple cores). | ||
static FastRandomContext g_insecure_rand_ctx_temp_path; | ||
|
||
struct TestData { | ||
fs::path m_tmp_dir; | ||
fs::path m_datadir; | ||
fs::path m_init_datadir; | ||
const CChainParams m_chain_params{*CChainParams::RegTest({})}; | ||
KernelNotifications m_notifs; | ||
util::SignalInterrupt m_interrupt; | ||
|
||
void Init() { | ||
SeedRandomForTest(SeedRand::SEED); | ||
const auto rand_str{g_insecure_rand_ctx_temp_path.rand256().ToString()}; | ||
m_tmp_dir = fs::temp_directory_path() / "fuzz_chainstate_" PACKAGE_NAME / rand_str; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just inline it, if it is only used once? Also, the build system overhead seems not worth it to place PACKAGE_NAME
here? (See lint failure)
//! To generate a random tmp datadir per process (necessary to fuzz with multiple cores). | |
static FastRandomContext g_insecure_rand_ctx_temp_path; | |
struct TestData { | |
fs::path m_tmp_dir; | |
fs::path m_datadir; | |
fs::path m_init_datadir; | |
const CChainParams m_chain_params{*CChainParams::RegTest({})}; | |
KernelNotifications m_notifs; | |
util::SignalInterrupt m_interrupt; | |
void Init() { | |
SeedRandomForTest(SeedRand::SEED); | |
const auto rand_str{g_insecure_rand_ctx_temp_path.rand256().ToString()}; | |
m_tmp_dir = fs::temp_directory_path() / "fuzz_chainstate_" PACKAGE_NAME / rand_str; | |
struct TestData { | |
fs::path m_tmp_dir; | |
fs::path m_datadir; | |
fs::path m_init_datadir; | |
const CChainParams m_chain_params{*CChainParams::RegTest({})}; | |
KernelNotifications m_notifs; | |
util::SignalInterrupt m_interrupt; | |
void Init() { | |
SeedRandomForTest(SeedRand::SEED); | |
const auto rand_str{FastRandomContext{}.rand256().ToString()}; //! To generate a random tmp datadir per process (necessary to fuzz with multiple cores). | |
m_tmp_dir = fs::temp_directory_path() / "fuzz_chainstate" / rand_str; |
🐙 This pull request conflicts with the target branch and needs rebase. |
⌛ There hasn't been much activity lately and the patch still needs rebase. What is the status here?
|
cc @darosior ? |
Forgot to post this here at the time, but I was working on an abstract block store for a different context some months ago: TheCharlatan@5f35e50. I'm not sure how useful this actually is though, since mounting a ramdisk just is very easy, but maybe it could be useful to verify that certain functions, e.g. allocate, are called at the expected moment. |
It is indeed easy, but for some reason I found mixed results when doing it by default in the CI: #31182 . I couldn't find a real improvement there, except for the So forcing the blockstore into ram for this target (and possibly others) could still be useful. |
This has unfortunately gotten lower in my list of priorities. I'll close this PR for now to clarify its status. I hope to get back to this at some point. Anyone willing to work on this, or parts of this feel free to grab or/and reach out. |
We don't have a fuzzing harness for our main consensus engine [0]. This PR introduces two new targets which respectively fuzz the
BlockManager
andChainstateManager
(process headers, blocks, reorgs and assert some invariants in doing so).There is two main obstacles to achieving this: PoW and io. The blocks and chainstate databases can be stored in memory but blocks still need a valid proof of work and to be stored on disk. Niklas solved the first issue in #28043 by simply introducing a global which makes it possible to mock the PoW check (his commit is cherry-picked here). After considering other approaches, i also used globals to mock disk io.
I'm interested with this PR in getting feedback on the concept and the approach, but also in suggestions of more invariants to be asserting in the
chainstate
fuzz target.Regarding other approaches i tried the most potentially promising was to leverage
ld
's--wrap
option to mock the syscalls without having to modify non-test code. But i didn't try too hard to make it work: better to have a demo of what can be achieved first with a more trivial way of mocking filesystem calls. If there is interest in these fuzz targets, i can give this approach another look.Regarding efficiency, the chainstate fuzz target is quite slow at the moment but i've at least 2x its performance by rebasing on #28960 and making
CheckBlockIndex
callable externally even if!ShouldCheckBlockIndex()
. Suggestions for performance improvements welcome too.[0] Well there is
utxo_total_supply
but it's very specialized toward exercising a specific past bug.