-
Notifications
You must be signed in to change notification settings - Fork 37.7k
addrman: Make consistency checks a runtime option #20233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK Should we run with |
|
609a338
to
aca3bb8
Compare
The consistency check fails on one of the unit tests. I've converted this PR to draft status while I work out why it's failing. |
Oh, I missed the "by default on regtest" in the PR description. Sorry about that :) Defaulting to sanity checking in regtest makes perfect sense! Even more Concept ACK :) |
aca3bb8
to
3260674
Compare
I've fixed the failing unit tests. |
This makes some of the functional tests run very slowly. I'll do some comparisons and maybe disable consistency checks on the slowest tests. |
Marking as draft until #20228 is merged. |
This seems to make rpc_net and p2p_getaddr_caching tests run extremely slowly:
I don't think there's any need for this to be based on #20228 -- passing a "check_addrman" from init to CConnman seems to work fine, see https://github.com/ajtowns/bitcoin/commits/202101-addrman-check eg. More generally, I'm not sure that these consistency checks are that useful to enable at runtime/CI rather than just using them as an aid when making changes to addrman internals. |
Yes, see my comment at #20233 (comment). I was planning on addressing that after 20228 was merged, probably by disabling the checks for those tests, or perhaps by changing the consistency check configuration to be a ratio, like the mempool consistency checks: Line 491 in 68196a8
Sure, we could continue to pass function calls and (now) ctor parameters through CConnman, but I don't see why that is desirable. Connman uses addrman, but there's no reason for it to have an addrman. Addrman is also used by peerman and the RPCs. Connman's resposibility is to open and maintain connections. Storing and maintaining our map of accessible nodes on the network is a separate responsibility. There was also some discussion here: http://www.erisian.com.au/bitcoin-core-dev/log-2020-12-16.html#l-585 about being able to manage addrman more directly. Again, in that case it'd be easier if RPC directly held a handle to the addrman instead of passing messages through connman.
I think somewhere that they're definitely useful would be in fuzz testing, to ensure that there's no way to violate addrman's invariants. See the usage of txrequest's sanity check here: bitcoin/src/test/fuzz/txrequest.cpp Line 373 in 68196a8
which is called after every loop in the fuzz tester. |
There's no point having addrman without a connman outside of unit tests (in which case the test can create an addrman directly), and connman uses addrman. That's enough reason for connman to be the thing that owns it, and for anything that wants it to access it via connman. I don't think there's anything you could do with But more importantly, it's easy to separate the two questions (should connman own the addrman instance and should debug_addrman be used more often) and judge them independently, so we should.
Yeah; I agree in principle. But while I think txrequest's checks are reasonably optimised to be run frequently with modest data sets, I think addrman's is very heavy weight and only really works for pretty small ones. Maybe it would make sense to have the automatic internal checks ( Any idea how much this PR currently slows down the fuzzer? Doing the checks irregularly might be enough either way though -- perhaps especially if how irregular it is adjust dynamically (100% for the first x checks, 50% for the next x checks, etc?) Hmm, actually, at least as it stands, perhaps addrman checks should be a regtest-only option, rather than a regtest-by-default option? Aren't those checks always too heavyweight to run when addrman is populated from mainnet/testnet? |
Rebased and addressed review comments from @jonatack and @amitiuttarwar |
@jonatack - I think it's still true. Try enabling The first commit fixes the build so it's possible to build with Without the Make deterministic addrman use nKey = 1 commit, then when we make addrman checks a runtime option and add the assert, then the tests would fail. |
Thanks @jnewbery, I'll have a look (once the guix build finishes ⏰) |
reACK 00fd089 |
ACK 00fd089 per reproduced the CI addrman fuzzer crash locally
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior serialize.h:205:93 in
fuzz: ./addrman.h:754: void CAddrMan::Check() const: Assertion `false' failed.
==62677== ERROR: libFuzzer: deadly signal
#0 0x55f7ef82d671 in __sanitizer_print_stack_trace (/home/jon/projects/bitcoin/bitcoin/src/test/fuzz/fuzz+0x1000671)
#1 0x55f7ef775008 in fuzzer::PrintStackTrace() fuzzer.o
#2 0x55f7ef758ef3 in fuzzer::Fuzzer::CrashCallback() fuzzer.o
#3 0x7fe5896ab13f (/lib/x86_64-linux-gnu/libpthread.so.0+0x1413f)
#4 0x7fe589317ce0 in __libc_signal_restore_set signal/../sysdeps/unix/sysv/linux/internal-signals.h:86:3
#5 0x7fe589317ce0 in raise signal/../sysdeps/unix/sysv/linux/raise.c:48:3
#6 0x7fe589301536 in abort stdlib/abort.c:79:7
#7 0x7fe58930140e in __assert_fail_base assert/assert.c:92:3
#8 0x7fe589310661 in __assert_fail assert/assert.c:101:3
#9 0x55f7ef86ed17 in CAddrMan::Check() const ./addrman.h:754:13
#10 0x55f7ef866d2c in CAddrMan::Select(bool) const ./addrman.h:601:9
#11 0x55f7ef8638e1 in addrman_fuzz_target(Span<unsigned char const>) test/fuzz/addrman.cpp:306:26
#12 0x55f7ef85b9d5 in void std::__invoke_impl<void, void (*&)(Span<unsigned char const>), Span<unsigned char const> >(std::__invoke_other, void (*&)(Span<unsigned char const>), Span<unsigned char const>&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:60:14
#13 0x55f7ef85b89d in std::enable_if<is_invocable_r_v<void, void (*&)(Span<unsigned char const>), Span<unsigned char const> >, void>::type std::__invoke_r<void, void (*&)(Span<unsigned char const>), Span<unsigned char const> >(void (*&)(Span<unsigned char const>), Span<unsigned char const>&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/invoke.h:110:2
#14 0x55f7ef85b608 in std::_Function_handler<void (Span<unsigned char const>), void (*)(Span<unsigned char const>)>::_M_invoke(std::_Any_data const&, Span<unsigned char const>&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:291:9
#15 0x55f7f078934d in std::function<void (Span<unsigned char const>)>::operator()(Span<unsigned char const>) const /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/std_function.h:622:14
#16 0x55f7f0788ff8 in LLVMFuzzerTestOneInput test/fuzz/fuzz.cpp:91:5
#17 0x55f7ef75a783 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) fuzzer.o
#18 0x55f7ef759c8a in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) fuzzer.o
#19 0x55f7ef75c084 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) fuzzer.o
#20 0x55f7ef75c299 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) fuzzer.o
#21 0x55f7ef74bae7 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) fuzzer.o
#22 0x55f7ef7757c2 in main (/home/jon/projects/bitcoin/bitcoin/src/test/fuzz/fuzz+0xf487c2)
#23 0x7fe589302d09 in __libc_start_main csu/../csu/libc-start.c:308:16
#24 0x55f7ef722969 in _start (/home/jon/projects/bitcoin/bitcoin/src/test/fuzz/fuzz+0xef5969)
NOTE: libFuzzer has rudimentary signal handlers.
Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 0 ; base unit: 0000000000000000000000000000000000000000
0x0,0x0,0x0,0x0,0x0,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0xe1,0xdc,0xdc,0xdc,0xdc,0xdc,0xdc,0xdc,0x23,0x23,0x23,0xbb,0x1,0x0,0x0,0x23,0x27,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x40,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x0,0x0,0x0,0x0,0x1,0x0,0x0,0x0,0x0,0x0,0xf,0xda,0x3f,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x0,0x23,0x23,0x0,0x0,0x0,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0xff,0xff,0xff,0x1f,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0x23,0xf4,0xf4,0xf4,0xf4,0xf4,0xf4,0xf4,0xf4,0xf4,0xb4,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x23,0x23,0x23,0x23,0x23,0x23,0xe1,0xdc,0xdc,0xdc,0xdc,0x32,0xa3,0x1,0x0,0x0,0x0,0x0,0x0,0x0,0x41,0x0,0xff,0x0,0x0,0xff,0x0,0x0,0x2b,
\x00\x00\x00\x00\x00#############\xe1\xdc\xdc\xdc\xdc\xdc\xdc\xdc###\xbb\x01\x00\x00#'#############@#################\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x0f\xda?\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00##\x00\x00\x00########\xff\xff\xff\x1f####################\xf4\xf4\xf4\xf4\xf4\xf4\xf4\xf4\xf4\xb4\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff######\xe1\xdc\xdc\xdc\xdc2\xa3\x01\x00\x00\x00\x00\x00\x00A\x00\xff\x00\x00\xff\x00\x00+
artifact_prefix='./'; Test unit written to ./crash-d20c926b04d9e4e2657d97d3c5e77bde0e9a9daa
Base64: AAAAAAAjIyMjIyMjIyMjIyMj4dzc3Nzc3NwjIyO7AQAAIycjIyMjIyMjIyMjIyMjQCMjIyMjIyMjIyMjIyMjIyMjAAAAAAEAAAAAAA/aPwAAAAAAAAAAAAAAIyMAAAAjIyMjIyMjI////x8jIyMjIyMjIyMjIyMjIyMjIyMjI/T09PT09PT09LT///////////////////8jIyMjIyPh3Nzc3DKjAQAAAAAAAEEA/wAA/wAAKw== Don't hesitate to reconsider taking the suggestion at #20233 (comment).
Got it. You are right--I dropped that commit and the addrman unit tests fail. |
00fd089
to
3d90d1b
Compare
I've disabled the consistency checks again in the addrman fuzz test. I think the problem is that the fuzzer will call I have a branch that removes |
Currently addrman consistency checks are a compile time option, and are not enabled in our CI. It's unlikely anyone is running these consistency checks. Make them a runtime option instead, where users can enable addrman consistency checks every n operations (similar to mempool tests). Update the addrman unit tests to do internal consistency checks every 100 operations (checking on every operations causes the test runtime to increase by several seconds). Also assert on a failed addrman consistency check to terminate program execution.
3d90d1b
to
a4d7854
Compare
ACK a4d7854 per src/addrman.cpp
@@ -438,6 +438,7 @@ int CAddrMan::Check_() const
// Run consistency checks 1 in m_consistency_check_ratio times if enabled
if (m_consistency_check_ratio == 0) return 0;
if (insecure_rand.randrange(m_consistency_check_ratio) >= 1) return 0;
+ LogPrintf("Check addrman\n");
|
Code-review ACK a4d7854 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code-review ACK a4d7854
Agree with vasild's idea of introducing constants for the consistency_checks
CAddrMan ctor param in unit tests, which can be done in a follow-up: #20233 (comment)
Also left two yocto-nits below regarding unit test assertion style, probably also best picked up in a follow-up.
|
||
BOOST_CHECK(addrman.size() == 23); | ||
BOOST_CHECK(addrman.SelectTriedCollision().ToString() == "250.1.1.19:0"); | ||
BOOST_CHECK(addrman.size() == 36); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BOOST_CHECK(addrman.size() == 36); | |
BOOST_CHECK_EQUAL(addrman.size(), 36); |
BOOST_CHECK(addrman.size() == 33); | ||
BOOST_CHECK(!addrman.Add(CAddress(addr36, NODE_NONE), source)); | ||
addrman.Good(addr36); | ||
BOOST_CHECK(addrman.size() == 59); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BOOST_CHECK(addrman.size() == 59); | |
BOOST_CHECK_EQUAL(addrman.size(), 59); |
The various nits/changes mentioned can be done in a followup. Either separately, or potentially as part of the |
…me option a4d7854 [addrman] Make addrman consistency checks a runtime option (John Newbery) 10aac24 [tests] Make deterministic addrman use nKey = 1 (John Newbery) fa9710f [addrman] Add deterministic argument to CAddrMan ctor (John Newbery) ee458d8 Add missing const to CAddrMan::Check_() (MarcoFalke) Pull request description: CAddrMan has internal consistency checks. Currently, these are only run when the program is compiled with the `DEBUG_ADDRMAN` option. This option is not enabled on any of our CI builds, and it's likely that no-one is running them at all. This PR makes consistency checks a (hidden) runtime option that can be enabled with `-checkaddrman`, where `-checkaddrman=n` will result in the consistency checks running every n operations (similar to `-checkmempool=n`). We set the ratio to 1/100 for our unit tests, and leave it disabled by default for all networks. Additionally, a consistency check failure now asserts, rather than logging and continuing. This matches the behavior of CTxMemPool and TxRequestTracker, where a failed consistency check asserts. ACKs for top commit: jonatack: ACK a4d7854 per `git diff 00fd089 a4d7854`, tested by adding logging similar to #22479 and running with `-checkaddrman=<n>` for various values 0/1/10/100 etc, tested the updated docs with `bitcoind -help-debug | grep -A2 "checkaddrman\|checkmempool"` and verified rebased on master that compiling with `CPPFLAGS="-DDEBUG_ADDRMAN"` no longer causes the build to error. mzumsande: Code-review ACK a4d7854 theStack: Code-review ACK a4d7854 Tree-SHA512: eaee003f7a99154822c5b5efbc62008d32c1efbecc6fec6e183427f6b2ae5d30b3be7924e3a7271b1a1de91517f5bd2a70011d45358c3105c6a0702f12b70f7c
Not sure if this is sufficient. Since #22493, we have a fuzz test that deserializes from random data, and |
…ests, make addrman consistency check ratio easier to change 7f122a4 fuzz: non-addrman fuzz tests: override-able check ratio (Vasil Dimov) 3bd83e2 fuzz: addrman fuzz tests: override-able check ratio (Vasil Dimov) 46b0fe7 test: non-addrman unit tests: override-able check ratio (Vasil Dimov) 81e4d54 test: addrman unit tests: override-able check ratio (Vasil Dimov) 6dff621 bench: put addrman check ratio in a variable (Vasil Dimov) 6f7c756 fuzz: parse the command line arguments in fuzz tests (Vasil Dimov) 92a0f7e test: parse the command line arguments in unit tests (Vasil Dimov) Pull request description: Previously command line arguments passed to unit and fuzz tests would be ignored by the tests themselves. They would be used by the boost test framework (e.g. `--run_test="addrman_tests/*"`) or by the fuzzer (e.g. `-runs=1`). However both provide ways to pass down the extra arguments to the test itself. Use that, parse the arguments and make them available to the tests via `gArgs`. This makes the tests more flexible as they can be run with any bitcoind config option specified on the command line. When creating `AddrMan` objects in tests, use `-checkaddrman=` (if provided) instead of hardcoding the check ratio in many different places. See #20233 (comment) for further motivation for this. ACKs for top commit: mzumsande: re-ACK 7f122a4 josibake: reACK 7f122a4 Tree-SHA512: 3a05e61e4d70a0569bb67594bcce3aad8fdef63cdcc54e2823a3bc9f18679571985004412b6c332a210f67849bab32d8467b4115fbff8f5fac9834982e60dcf3
Summary: ``` Also: Always compile the function signature to avoid similar issues in the future. ``` Partial backport of [[bitcoin/bitcoin#20233 | core#20233]]: bitcoin/bitcoin@ee458d8 Test Plan: With DEBUG_ADDRMAN: ninja all check Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Differential Revision: https://reviews.bitcoinabc.org/D12270
Summary: ``` addrman_tests fail when consistency checks are enabled, since the tests set the deterministic test addrman's nKey value to zero, which is an invalid value. Change this so that deterministic addrman's nKey value is set to 1. This requires updating a few tests that are using magic values derived from nKey being set to 0. ``` Partial backport of [[bitcoin/bitcoin#20233 | core#20233]]: bitcoin/bitcoin@10aac24 Depends on D12270. Test Plan: With DEBUG_ADDRMAN: ninja all check Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Differential Revision: https://reviews.bitcoinabc.org/D12271
Summary: ``` Currently addrman consistency checks are a compile time option, and are not enabled in our CI. It's unlikely anyone is running these consistency checks. Make them a runtime option instead, where users can enable addrman consistency checks every n operations (similar to mempool tests). Update the addrman unit tests to do internal consistency checks every 100 operations (checking on every operations causes the test runtime to increase by several seconds). ``` Completes backport of [[bitcoin/bitcoin#20233 | core#20233]]: bitcoin/bitcoin@a4d7854#diff-259675a17ed61244b5f1f5857285bdab63cc81f41334ed64c74ba623b61cf2a2 Includes a fix to prevent the build from failing on 32 bits platforms. Depends on D12271. Test Plan: ninja all check ninja bitcoin-fuzzers Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Differential Revision: https://reviews.bitcoinabc.org/D12272
CAddrMan has internal consistency checks. Currently, these are only run when the program is compiled with the
DEBUG_ADDRMAN
option. This option is not enabled on any of our CI builds, and it's likely that no-one is running them at all.This PR makes consistency checks a (hidden) runtime option that can be enabled with
-checkaddrman
, where-checkaddrman=n
will result in the consistency checks running every n operations (similar to-checkmempool=n
). We set the ratio to 1/100 for our unit tests, and leave it disabled by default for all networks. Additionally, a consistency check failure now asserts, rather than logging and continuing. This matches the behavior of CTxMemPool and TxRequestTracker, where a failed consistency check asserts.