-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Drop some TSan suppressions #19983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop some TSan suppressions #19983
Conversation
ec17cfc
to
425c981
Compare
Concept ACK Great to see |
I must confess that, unlike #19982, I'm not as much confident in this PR removals as all of them are found empirically. |
Some more CI TSan builds: |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsNo conflicts as of last run. |
425c981
to
49c3a52
Compare
Rebased 49c3a52 -> 730e5ff (pr19983.02 -> pr19983.03) due to the conflict with #20285. |
cf1ac36
to
27752af
Compare
Rebased 71518e3 -> 27752af (pr19983.04 -> pr19983.06) due to the conflict with #19337. |
Rebased 3c2e969 -> 89952ae (pr19983.07 -> pr19983.08) due to the conflict with #19425. |
Rebased 89952ae -> b0b6092 (pr19983.08 -> pr19983.09) due to the conflict with #19982. |
cr ACK b0b6092: patch looks correct and doesn't touch any non-test code |
Rebased b0b6092 -> a17f69f (pr19983.09 -> pr19983.10) due to the conflict with #20683. |
Updated a17f69f -> 354d2cb (pr19983.10 -> pr19983.11, diff):
|
cr ACK 354d2cb: patch looks correct and doesn't touch any non-test code |
@@ -161,6 +161,9 @@ def run_test(self): | |||
assert_equal(len(peer_info), 1) # node is still connected | |||
assert_equal(peer_info[0]['permissions'], ['download']) | |||
|
|||
# Ignore non-critical TSan warnings. | |||
self.stop_node(0, ignored_stderr="WARNING: too long mutex cycle found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't reproduce this locally. How often did you have to run this test?
I am running with:
diff --git a/test/sanitizer_suppressions/tsan b/test/sanitizer_suppressions/tsan
index 986e09605..020f3da9b 100644
--- a/test/sanitizer_suppressions/tsan
+++ b/test/sanitizer_suppressions/tsan
@@ -29,12 +29,7 @@ race:zmq::*
race:bitcoin-qt
# deadlock (TODO fix)
deadlock:CConnman::ForNode
-deadlock:CConnman::GetNodeStats
deadlock:CChainState::ConnectTip
-deadlock:UpdateTip
-
-# WalletBatch (unidentified deadlock)
-deadlock:WalletBatch
# Intentional deadlock in tests
deadlock:TestPotentialDeadLockDetected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't reproduce this locally.
What is -stdlib
compiled against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used ./ci/test/00_setup_env_native_tsan.sh
to compile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. On master (df127ec) with your patch I have no "WARNING: too long mutex cycle found" from the thread sanitizer when running test/functional/feature_maxuploadtarget.py
. Maybe the reason is the updated clang?
Instead, the skipping -DDEBUG_LOCKORDER
from CPPFLAGS
causes another warning even on master w/o your patch:
$ test/functional/feature_maxuploadtarget.py
2020-12-22T15:02:39.792000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_731auno_
2020-12-22T15:02:57.821000Z TestFramework (INFO): Peer 0 disconnected after downloading old block too many times
2020-12-22T15:03:42.603000Z TestFramework (INFO): Peer 1 able to repeatedly download new block
2020-12-22T15:03:42.712000Z TestFramework (INFO): Peer 1 disconnected after trying to download old block
2020-12-22T15:03:42.712000Z TestFramework (INFO): Advancing system time on node to clear counters...
2020-12-22T15:03:42.867000Z TestFramework (INFO): Peer 2 able to download old block
2020-12-22T15:03:42.873000Z TestFramework (INFO): Restarting node 0 with download permission and 1MB maxuploadtarget
2020-12-22T15:03:45.050000Z TestFramework (INFO): Peer still connected after trying to download old block (download permission)
2020-12-22T15:03:45.103000Z TestFramework (INFO): Stopping nodes
Traceback (most recent call last):
File "test/functional/feature_maxuploadtarget.py", line 166, in <module>
MaxUploadTest().main()
File "/home/hebasto/code/gh/bitcoin/test/functional/test_framework/test_framework.py", line 149, in main
exit_code = self.shutdown()
File "/home/hebasto/code/gh/bitcoin/test/functional/test_framework/test_framework.py", line 278, in shutdown
self.stop_nodes()
File "/home/hebasto/code/gh/bitcoin/test/functional/test_framework/test_framework.py", line 525, in stop_nodes
node.stop_node(wait=wait, wait_until_stopped=False)
File "/home/hebasto/code/gh/bitcoin/test/functional/test_framework/test_node.py", line 333, in stop_node
raise AssertionError("Unexpected stderr {} != {}".format(stderr, expected_stderr))
AssertionError: Unexpected stderr ==================
WARNING: ThreadSanitizer: double lock of a mutex (pid=365627)
#0 pthread_mutex_lock <null> (bitcoind+0xa8cf6)
#1 std::__1::mutex::lock() <null> (libc++.so.1+0x83505)
#2 CConnman::ThreadSocketHandler() /home/hebasto/code/gh/bitcoin/src/net.cpp:1542:9 (bitcoind+0x1780d4)
#3 decltype(*(std::__1::forward<CConnman*&>(fp0)).*fp()) std::__1::__invoke<void (CConnman::*&)(), CConnman*&, void>(void (CConnman::*&)(), CConnman*&) /usr/lib/llvm-10/bin/../include/c++/v1/type_traits:3480:1 (bitcoind+0x1a325a)
#4 std::__1::__bind_return<void (CConnman::*)(), std::__1::tuple<CConnman*>, std::__1::tuple<>, __is_valid_bind_return<void (CConnman::*)(), std::__1::tuple<CConnman*>, std::__1::tuple<> >::value>::type std::__1::__apply_functor<void (CConnman::*)(), std::__1::tuple<CConnman*>, 0ul, std::__1::tuple<> >(void (CConnman::*&)(), std::__1::tuple<CConnman*>&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&) /usr/lib/llvm-10/bin/../include/c++/v1/functional:2770:12 (bitcoind+0x1a325a)
#5 std::__1::__bind_return<void (CConnman::*)(), std::__1::tuple<CConnman*>, std::__1::tuple<>, __is_valid_bind_return<void (CConnman::*)(), std::__1::tuple<CConnman*>, std::__1::tuple<> >::value>::type std::__1::__bind<void (CConnman::*)(), CConnman*>::operator()<>() /usr/lib/llvm-10/bin/../include/c++/v1/functional:2803:20 (bitcoind+0x1a325a)
#6 decltype(std::__1::forward<std::__1::__bind<void (CConnman::*)(), CConnman*>&>(fp)()) std::__1::__invoke<std::__1::__bind<void (CConnman::*)(), CConnman*>&>(std::__1::__bind<void (CConnman::*)(), CConnman*>&) /usr/lib/llvm-10/bin/../include/c++/v1/type_traits:3539:1 (bitcoind+0x1a325a)
#7 void std::__1::__invoke_void_return_wrapper<void>::__call<std::__1::__bind<void (CConnman::*)(), CConnman*>&>(std::__1::__bind<void (CConnman::*)(), CConnman*>&) /usr/lib/llvm-10/bin/../include/c++/v1/__functional_base:348:9 (bitcoind+0x1a325a)
#8 std::__1::__function::__alloc_func<std::__1::__bind<void (CConnman::*)(), CConnman*>, std::__1::allocator<std::__1::__bind<void (CConnman::*)(), CConnman*> >, void ()>::operator()() /usr/lib/llvm-10/bin/../include/c++/v1/functional:1540:16 (bitcoind+0x1a325a)
#9 std::__1::__function::__func<std::__1::__bind<void (CConnman::*)(), CConnman*>, std::__1::allocator<std::__1::__bind<void (CConnman::*)(), CConnman*> >, void ()>::operator()() /usr/lib/llvm-10/bin/../include/c++/v1/functional:1714:12 (bitcoind+0x1a325a)
#10 std::__1::__function::__value_func<void ()>::operator()() const /usr/lib/llvm-10/bin/../include/c++/v1/functional:1867:16 (bitcoind+0x148cfe)
#11 std::__1::function<void ()>::operator()() const /usr/lib/llvm-10/bin/../include/c++/v1/functional:2473:12 (bitcoind+0x148cfe)
#12 void TraceThread<std::__1::function<void ()> >(char const*, std::__1::function<void ()>) /home/hebasto/code/gh/bitcoin/src/./util/system.h:438:9 (bitcoind+0x148cfe)
#13 decltype(std::__1::forward<void (*)(char const*, std::__1::function<void ()>)>(fp)(std::__1::forward<char const*>(fp0), std::__1::forward<std::__1::function<void ()> >(fp0))) std::__1::__invoke<void (*)(char const*, std::__1::function<void ()>), char const*, std::__1::function<void ()> >(void (*&&)(char const*, std::__1::function<void ()>), char const*&&, std::__1::function<void ()>&&) /usr/lib/llvm-10/bin/../include/c++/v1/type_traits:3539:1 (bitcoind+0x1a342c)
#14 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(char const*, std::__1::function<void ()>), char const*, std::__1::function<void ()>, 2ul, 3ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(char const*, std::__1::function<void ()>), char const*, std::__1::function<void ()> >&, std::__1::__tuple_indices<2ul, 3ul>) /usr/lib/llvm-10/bin/../include/c++/v1/thread:273:5 (bitcoind+0x1a342c)
#15 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(char const*, std::__1::function<void ()>), char const*, std::__1::function<void ()> > >(void*) /usr/lib/llvm-10/bin/../include/c++/v1/thread:284:5 (bitcoind+0x1a342c)
Location is heap block of size 328920 at 0x7f3445476000 allocated by main thread:
#0 operator new(unsigned long) <null> (bitcoind+0x11883b)
#1 std::__1::__unique_if<CConnman>::__unique_single std::__1::make_unique<CConnman, unsigned long, unsigned long, bool>(unsigned long&&, unsigned long&&, bool&&) /usr/lib/llvm-10/bin/../include/c++/v1/memory:3028:28 (bitcoind+0x134039)
#2 std::__1::unique_ptr<CConnman, std::__1::default_delete<CConnman> > MakeUnique<CConnman, unsigned long, unsigned long, bool>(unsigned long&&, unsigned long&&, bool&&) /home/hebasto/code/gh/bitcoin/src/./util/memory.h:17:12 (bitcoind+0x134039)
#3 AppInitMain(util::Ref const&, NodeContext&, interfaces::BlockAndHeaderTipInfo*) /home/hebasto/code/gh/bitcoin/src/init.cpp:1386:20 (bitcoind+0x134039)
#4 AppInit(int, char**) /home/hebasto/code/gh/bitcoin/src/bitcoind.cpp:133:43 (bitcoind+0x11b773)
#5 main /home/hebasto/code/gh/bitcoin/src/bitcoind.cpp:161:13 (bitcoind+0x11b773)
Mutex M132842 (0x7f34454c63f8) created at:
#0 pthread_mutex_lock <null> (bitcoind+0xa8cf6)
#1 std::__1::mutex::lock() <null> (libc++.so.1+0x83505)
#2 AppInitMain(util::Ref const&, NodeContext&, interfaces::BlockAndHeaderTipInfo*) /home/hebasto/code/gh/bitcoin/src/init.cpp:1981:24 (bitcoind+0x13d0a3)
#3 AppInit(int, char**) /home/hebasto/code/gh/bitcoin/src/bitcoind.cpp:133:43 (bitcoind+0x11b773)
#4 main /home/hebasto/code/gh/bitcoin/src/bitcoind.cpp:161:13 (bitcoind+0x11b773)
SUMMARY: ThreadSanitizer: double lock of a mutex (/home/hebasto/code/gh/bitcoin/src/bitcoind+0xa8cf6) in pthread_mutex_lock
================== !=
[node 0] Cleaning up leftover process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to update this PR after more tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated 354d2cb -> 3e15712 (pr19983.11 -> pr19983.12):
|
deadlock:UpdateTip | ||
|
||
# WalletBatch (unidentified deadlock) | ||
deadlock:WalletBatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the steps to verify this patch? Note that the deadlocks are intermittent, so there is no way to run the tests and reproduce them or not reproduce them reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TSan searches for potential deadlocks, i.e., it verifies consistency of lock ordering in the code, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock order graph is generated at runtime and is not deterministic. It depends on the code paths that are executed, and our tests exercise different code paths in each run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the steps to verify this patch?
I don't know (besides tracking code changes since these suppressions were introduced).
Having a smaller set of suppressions is better, though. If this patch appears wrong in the next year, it could be reverted :)
Ok, fair enough. Let's give it a try. |
3e15712 Update TSan suppressions (Hennadii Stepanov) Pull request description: It seems possible now to drop some TSan suppressions. Top commit has no ACKs. Tree-SHA512: 94518fd2f3a7168b2989424de0696e42c8f509b833aafbc7e75f4c1180a0b8d9a47f43c50d06b03b26a924643afe86274b2062c9d456c17a68576d19566ed66f
It seems possible now to drop some TSan suppressions.