-
Notifications
You must be signed in to change notification settings - Fork 37.7k
miner: Avoid stack-use-after-return in validationinterface #18742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Oh, great find! How did you find this one and in what commit was it introduced? :) |
Good question. This is the traceback:
|
I might write a test tomorrow with steps to reproduce |
faadd38
to
fab402a
Compare
fab402a
to
bbbbf38
Compare
@practicalswift This is a race, so I couldn't find a test that is reproducible. Though, I have attached a unit test, that fails when run long enough in a loop. All you need to do is
|
If someone has issues reproducing the crash in the unit test, the following diff might help: diff --git a/src/validationinterface.cpp b/src/validationinterface.cpp
index 11000774c0..6a311eeb44 100644
--- a/src/validationinterface.cpp
+++ b/src/validationinterface.cpp
@@ -11,6 +11,7 @@
#include <primitives/block.h>
#include <primitives/transaction.h>
#include <scheduler.h>
+#include <util/time.h>
#include <future>
#include <unordered_map>
@@ -80,6 +81,7 @@ public:
++it->count;
{
REVERSE_LOCK(lock);
+ UninterruptibleSleep(std::chrono::microseconds(500));
f(*it->callbacks);
}
it = --it->count ? std::next(it) : m_list.erase(it); |
Concept ACK. Running just fa9f3a5 (no additional patch), I generally see a crash between 5 - 20 iterations: /bitcoin# while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done
Running 1 test case...
*** No errors detected
Running 1 test case...
*** No errors detected
Running 1 test case...
*** No errors detected
Running 1 test case...
*** No errors detected
Running 1 test case...
=================================================================
==95792==ERROR: AddressSanitizer: unknown-crash on address 0x7f8f04f60dc0 at pc 0x55abc9388623 bp 0x7f8f057777f0 sp 0x7f8f057777e8
READ of size 8 at 0x7f8f04f60dc0 thread T4
#0 0x55abc9388622 in CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14::operator()(CValidationInterface&) const /bitcoin/src/validationinterface.cpp:241:75
#1 0x55abc9388622 in void MainSignalsInstance::Iterate<CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14>(CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14&&) /bitcoin/src/validationinterface.cpp:83
#2 0x55abc9388622 in CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&) /bitcoin/src/validationinterface.cpp:241
#3 0x55abc8b9c94c in validationinterface_tests::unregister_validation_interface_race::test_method()::$_0::operator()() const /bitcoin/src/test/validationinterface_tests.cpp:28:30
#4 0x55abc8b9c94c in void std::__invoke_impl<void, validationinterface_tests::unregister_validation_interface_race::test_method()::$_0>(std::__invoke_other, validationinterface_tests::unregister_validation_interface_race::test_method()::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:60
#5 0x55abc8b9c94c in std::__invoke_result<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0>::type std::__invoke<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0>(validationinterface_tests::unregister_validation_interface_race::test_method()::$_0&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/invoke.h:95
#6 0x55abc8b9c94c in decltype(std::__invoke(_S_declval<0ul>())) std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:244
#7 0x55abc8b9c94c in std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0> >::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:253
#8 0x55abc8b9c94c in std::thread::_State_impl<std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_0> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:196
#9 0x7f8f0c6d4b2e (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbbb2e)
#10 0x7f8f0cccefa2 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7fa2)
#11 0x7f8f0c3a34ce in clone (/lib/x86_64-linux-gnu/libc.so.6+0xf94ce)
Address 0x7f8f04f60dc0 is located in stack of thread T5 at offset 32 in frame
#0 0x55abc8b9d1af in std::thread::_State_impl<std::thread::_Invoker<std::tuple<validationinterface_tests::unregister_validation_interface_race::test_method()::$_1> > >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/thread:196
This frame has 1 object(s):
[32, 40) 'sub.i.i.i.i.i' <== Memory access at offset 32 is inside this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
Thread T5 created by T0 here:
#0 0x55abc7d1002d in pthread_create (/bitcoin/src/test/test_bitcoin+0x37702d)
#1 0x7f8f0c6d4db4 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbbdb4)
#2 0x55abc8b97b1e in validationinterface_tests::unregister_validation_interface_race_invoker() /bitcoin/src/test/validationinterface_tests.cpp:19:1
#3 0x55abc7ded77f in boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:118:11
SUMMARY: AddressSanitizer: unknown-crash /bitcoin/src/validationinterface.cpp:241:75 in CMainSignals::BlockChecked(CBlock const&, BlockValidationState const&)::$_14::operator()(CValidationInterface&) const
Shadow bytes around the buggy address:
0x0ff2609e4160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e4170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e4180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e4190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e41a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0ff2609e41b0: 00 00 00 00 00 00 00 00[00]00 00 00 00 00 00 00
0x0ff2609e41c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e41d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e41e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e41f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff2609e4200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
Thread T4 created by T0 here:
#0 0x55abc7d1002d in pthread_create (/bitcoin/src/test/test_bitcoin+0x37702d)
#1 0x7f8f0c6d4db4 in std::thread::_M_start_thread(std::unique_ptr<std::thread::_State, std::default_delete<std::thread::_State> >, void (*)()) (/usr/lib/x86_64-linux-gnu/libstdc++.so.6+0xbbdb4)
#2 0x55abc8b97b1e in validationinterface_tests::unregister_validation_interface_race_invoker() /bitcoin/src/test/validationinterface_tests.cpp:19:1
#3 0x55abc7ded77f in boost::detail::function::void_function_invoker0<void (*)(), void>::invoke(boost::detail::function::function_buffer&) /usr/include/boost/function/function_template.hpp:118:11
==95792==ABORTING |
cc @promag , @ryanofsky You seem qualified to review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK bbbbf38.
Like @ryanofsky suggested, we should only have RegisterValidationInterface(shared_ptr)
so this is a step in that direction.
bool accepted = ProcessNewBlock(Params(), blockptr, /* fForceProcessing */ true, /* fNewBlock */ &new_block); | ||
UnregisterValidationInterface(&sc); | ||
UnregisterSharedValidationInterface(sc); | ||
if (!new_block && accepted) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is to call SyncWithValidationInterfaceQueue
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative to what? The notification we are listening to is never put in the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that after UnregisterValidationInterface + SyncWithValidationInterfaceQueue
there is no pointer to sc
in validation queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does SyncWithValidationInterfaceQueue
prevent new notifications from being scheduled for sc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New notifications won't be scheduled because the UnregisterValidationInterface
above. Only problem is a concurrent notification being called, so SyncWithValidationInterfaceQueue
ensures that case doesn't happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I understand. Yeah, correct.
Could you revert or move formatting changes in validationinterface.h/cpp files to a separate commit? I'm starting at that looking for actual changes but is it all reformatting? |
Concept ACK. This is a good find and all changes look ok, but there's so much going on it's a headache to reverse engineer everything that's happening. Suggest making one change per commit or perhaps dropping some of the changes |
It is also interesting if the bug described #18742 (comment) started happening recently. Unless I missed something, it seems like the bug has been present a very long time, and I wonder if #18524 made it more likely to happen. |
bbbbf38
to
fa33a63
Compare
Addressed feedback by @ryanofsky. No code changes, just more commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK fa33a63. Great find and changes are very clear now!
@jnewbery in this case it's the test that's fixed so what you suggest is to just add the |
I can put the test in
|
This commit is (intentionally) adding a broken test. The test is broken because it registering a subscriber object that can go out of scope while events are still being sent. To run the broken test and reproduce the bug: - Remove comment /** and */ - ./configure --with-sanitizers=address - export ASAN_OPTIONS=detect_leaks=0 - make - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done
…hValidationInterfaceQueue For the purpose of this test the two have the same outcome, but this one is shorter and avoids a sleep for 0.1 seconds.
This is achieved by switching to a shared_ptr. Also, switch the validationinterfaces in the tests to use shared_ptrs for the same reason.
fa6d1a0
to
7777f2a
Compare
Pushed an empty diff to defuse the test. Added instructions to the commit message on how to add the fuse back in.
|
@MarcoFalke did you see #18742 (comment)? |
@promag Yes, but that causes a deadlock on shutdown. Also the shutdown sequence should be bug free right now because the message handler is stopped before any unregisters should happen. I still like your idea, but I think it should be prepared, reviewed and merged for 0.21, not as a backport. For the backport we should aim at a fix that has little chance of breaking stuff. The fix here should satisfy this because the same fix has already been applied to the wallet in #18338, which is also in 0.20 |
Code review ACK 7777f2a |
Code review ACK 7777f2a. |
For backport, only the first and last commit are needed. |
Thanks. Will have this up in #18973 shortly. |
post merge code review ACK 7777f2a |
This commit is (intentionally) adding a broken test. The test is broken because it registering a subscriber object that can go out of scope while events are still being sent. To run the broken test and reproduce the bug: - Remove comment /** and */ - ./configure --with-sanitizers=address - export ASAN_OPTIONS=detect_leaks=0 - make - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done Github-Pull: bitcoin#18742 Rebased-From: fab6d06
This is achieved by switching to a shared_ptr. Also, switch the validationinterfaces in the tests to use shared_ptrs for the same reason. Github-Pull: bitcoin#18742 Rebased-From: 7777f2a
This commit is (intentionally) adding a broken test. The test is broken because it registering a subscriber object that can go out of scope while events are still being sent. To run the broken test and reproduce the bug: - Remove comment /** and */ - ./configure --with-sanitizers=address - export ASAN_OPTIONS=detect_leaks=0 - make - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done Github-Pull: bitcoin#18742 Rebased-From: fab6d06
This is achieved by switching to a shared_ptr. Also, switch the validationinterfaces in the tests to use shared_ptrs for the same reason. Github-Pull: bitcoin#18742 Rebased-From: 7777f2a
245c862 test: disable script fuzz tests (fanquake) 9a8fb4c fuzz: Remove enumeration of expected deserialization exceptions in ProcessMessage(...) fuzzer (practicalswift) 6161c94 [net processing] Only send a getheaders for one block in an INV (John Newbery) cf2a6e2 test: Remove const to work around compiler error on xenial (Wladimir J. van der Laan) cc7d344 miner: Avoid stack-use-after-return in validationinterface (MarcoFalke) 37a6207 test: Add unregister_validation_interface_race test (MarcoFalke) ff4dc20 gui: Fix manual coin control with multiple wallets loaded (João Barbosa) ed0afe8 test: Add test for conflicted wallet tx notifications (Russell Yanofsky) 251e321 rpc: Relock wallet only if most recent callback (João Barbosa) ca4dac4 rpc: Add mutex to guard deadlineTimers (João Barbosa) a3fe458 [docs] Improve commenting in ProcessGetData() (John Newbery) 011532e [test] test that an invalid GETDATA doesn't prevent processing of future messages (Amiti Uttarwar) 1e73d72 [net processing] ignore unknown INV types in GETDATA messages (Amiti Uttarwar) fb82173 [net processing] ignore tx GETDATA from blocks-only peers (Amiti Uttarwar) 315ae14 gui: Fix itemWalletAddress leak when not tree mode (João Barbosa) Pull request description: Backports the following PRs to the 0.20 branch: * #18578: gui: Fix leak in CoinControlDialog::updateView * #18808: [net processing] Drop unknown types in getdata * #18814: rpc: Relock wallet only if most recent callback * #18878: test: Add test for conflicted wallet tx notifications * #18894: gui: Fix manual coin control with multiple wallets loaded * #18742: miner: Avoid stack-use-after-return in validationinterface * #18962: net processing: Only send a getheaders for one block in an INV * #18975: test: Remove const to work around compiler error on xenial ACKs for top commit: promag: Tested ACK 245c862 coin control with multiple wallets. laanwj: ACK 245c862 MarcoFalke: ACK 245c862 solved the conflicts myself as a sanity check. Did not re-review 🍷 Tree-SHA512: 285e5a5fad5bbeba6032742c65dc68836e8eccfcceda9e69fec4ddd162a3f61679a96f9bbe3d434267835af67c21ac4c05accf6f63e827c2eb47203c6daafe31
Summary: This is a backport of Core [[bitcoin/bitcoin#18742 | PR18742]] [1/3] bitcoin/bitcoin@fa770ce Test Plan: `ninja` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D9103
…hValidationInterfaceQueue Summary: For the purpose of this test the two have the same outcome, but this one is shorter and avoids a sleep for 0.1 seconds. This is a backport of Core [[bitcoin/bitcoin#18742 | PR18742]] [2/3] bitcoin/bitcoin@fa5ceb2 Depends on D9103 Test Plan: `ninja check` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D9104
Summary: This is a squash of 2 commits, to add the unbroken test and the solution at the same time, and a revert of D5140 (different solution to the same problem) > test: Add unregister_validation_interface_race test bitcoin/bitcoin@fab6d06 > miner: Avoid stack-use-after-return in validationinterface > > This is achieved by switching to a shared_ptr. > > Also, switch the validationinterfaces in the tests to use shared_ptrs > for the same reason. bitcoin/bitcoin@7777f2a This is a backport of Core [[bitcoin/bitcoin#18742 | PR18742]] [3/3] This replaces the solution in D5140 See comment https://reviews.bitcoinabc.org/D5140#124338 Depends on D9104 Test Plan: Test suggested by this PR: ``` export ASAN_OPTIONS=detect_leaks=0 ninja && ninja check while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done ``` Test from D5140: ``` CC=clang CXX=clang++ cmake -GNinja .. -DCMAKE_BUILD_TYPE=Debug ninja check-extended ``` Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Subscribers: Fabien Differential Revision: https://reviews.bitcoinabc.org/D9105
This commit is (intentionally) adding a broken test. The test is broken because it registering a subscriber object that can go out of scope while events are still being sent. To run the broken test and reproduce the bug: - Remove comment /** and */ - ./configure --with-sanitizers=address - export ASAN_OPTIONS=detect_leaks=0 - make - while ./src/test/test_bitcoin -t validationinterface_tests/unregister_validation_interface_race --catch_system_errors=no ; do true; done Github-Pull: #18742 Rebased-From: fab6d06
This is achieved by switching to a shared_ptr. Also, switch the validationinterfaces in the tests to use shared_ptrs for the same reason. Github-Pull: #18742 Rebased-From: 7777f2a
When a validationinterface has itself unregistered in one thread, but is about to get executed in another thread [1], there is a race:
[1]
bitcoin/src/validationinterface.cpp
Lines 82 to 83 in 6413980
This happens in the miner. More generally it happens everywhere where at least one thread is generating notifications and another one is unregistering a validationinterface.
This issue has been fixed in commit ab31b9d, but the fix has not been applied to the miner.
Example where this happened in practice: https://travis-ci.org/github/bitcoin/bitcoin/jobs/675322230#L4414