Skip to content

Conversation

hebasto
Copy link
Member

@hebasto hebasto commented Sep 20, 2020

It seems possible now to drop some TSan suppressions.

@practicalswift
Copy link
Contributor

Concept ACK

Great to see deadlock:UpdateTip go :)

@hebasto hebasto marked this pull request as ready for review September 20, 2020 16:11
@hebasto
Copy link
Member Author

hebasto commented Sep 20, 2020

@practicalswift

Great to see deadlock:UpdateTip go :)

I must confess that, unlike #19982, I'm not as much confident in this PR removals as all of them are found empirically.

@hebasto
Copy link
Member Author

hebasto commented Sep 20, 2020

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 20, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@hebasto
Copy link
Member Author

hebasto commented Nov 14, 2020

Rebased 49c3a52 -> 730e5ff (pr19983.02 -> pr19983.03) due to the conflict with #20285.

@hebasto
Copy link
Member Author

hebasto commented Nov 28, 2020

Rebased 71518e3 -> 27752af (pr19983.04 -> pr19983.06) due to the conflict with #19337.

@hebasto
Copy link
Member Author

hebasto commented Dec 10, 2020

Rebased 3c2e969 -> 89952ae (pr19983.07 -> pr19983.08) due to the conflict with #19425.

@hebasto
Copy link
Member Author

hebasto commented Dec 11, 2020

Rebased 89952ae -> b0b6092 (pr19983.08 -> pr19983.09) due to the conflict with #19982.

@practicalswift
Copy link
Contributor

practicalswift commented Dec 11, 2020

cr ACK b0b6092: patch looks correct and doesn't touch any non-test code

@hebasto
Copy link
Member Author

hebasto commented Dec 22, 2020

Rebased b0b6092 -> a17f69f (pr19983.09 -> pr19983.10) due to the conflict with #20683.

@hebasto
Copy link
Member Author

hebasto commented Dec 22, 2020

Updated a17f69f -> 354d2cb (pr19983.10 -> pr19983.11, diff):

@practicalswift
Copy link
Contributor

cr ACK 354d2cb: patch looks correct and doesn't touch any non-test code

@@ -161,6 +161,9 @@ def run_test(self):
assert_equal(len(peer_info), 1) # node is still connected
assert_equal(peer_info[0]['permissions'], ['download'])

# Ignore non-critical TSan warnings.
self.stop_node(0, ignored_stderr="WARNING: too long mutex cycle found")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't reproduce this locally. How often did you have to run this test?

I am running with:

diff --git a/test/sanitizer_suppressions/tsan b/test/sanitizer_suppressions/tsan
index 986e09605..020f3da9b 100644
--- a/test/sanitizer_suppressions/tsan
+++ b/test/sanitizer_suppressions/tsan
@@ -29,12 +29,7 @@ race:zmq::*
 race:bitcoin-qt
 # deadlock (TODO fix)
 deadlock:CConnman::ForNode
-deadlock:CConnman::GetNodeStats
 deadlock:CChainState::ConnectTip
-deadlock:UpdateTip
-
-# WalletBatch (unidentified deadlock)
-deadlock:WalletBatch
 
 # Intentional deadlock in tests
 deadlock:TestPotentialDeadLockDetected

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't reproduce this locally.

What is -stdlib compiled against?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used ./ci/test/00_setup_env_native_tsan.sh to compile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. On master (df127ec) with your patch I have no "WARNING: too long mutex cycle found" from the thread sanitizer when running test/functional/feature_maxuploadtarget.py . Maybe the reason is the updated clang?

Instead, the skipping -DDEBUG_LOCKORDER from CPPFLAGS causes another warning even on master w/o your patch:

$ test/functional/feature_maxuploadtarget.py 
2020-12-22T15:02:39.792000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_731auno_
2020-12-22T15:02:57.821000Z TestFramework (INFO): Peer 0 disconnected after downloading old block too many times
2020-12-22T15:03:42.603000Z TestFramework (INFO): Peer 1 able to repeatedly download new block
2020-12-22T15:03:42.712000Z TestFramework (INFO): Peer 1 disconnected after trying to download old block
2020-12-22T15:03:42.712000Z TestFramework (INFO): Advancing system time on node to clear counters...
2020-12-22T15:03:42.867000Z TestFramework (INFO): Peer 2 able to download old block
2020-12-22T15:03:42.873000Z TestFramework (INFO): Restarting node 0 with download permission and 1MB maxuploadtarget
2020-12-22T15:03:45.050000Z TestFramework (INFO): Peer still connected after trying to download old block (download permission)
2020-12-22T15:03:45.103000Z TestFramework (INFO): Stopping nodes
Traceback (most recent call last):
  File "test/functional/feature_maxuploadtarget.py", line 166, in <module>
    MaxUploadTest().main()
  File "/home/hebasto/code/gh/bitcoin/test/functional/test_framework/test_framework.py", line 149, in main
    exit_code = self.shutdown()
  File "/home/hebasto/code/gh/bitcoin/test/functional/test_framework/test_framework.py", line 278, in shutdown
    self.stop_nodes()
  File "/home/hebasto/code/gh/bitcoin/test/functional/test_framework/test_framework.py", line 525, in stop_nodes
    node.stop_node(wait=wait, wait_until_stopped=False)
  File "/home/hebasto/code/gh/bitcoin/test/functional/test_framework/test_node.py", line 333, in stop_node
    raise AssertionError("Unexpected stderr {} != {}".format(stderr, expected_stderr))
AssertionError: Unexpected stderr ==================
WARNING: ThreadSanitizer: double lock of a mutex (pid=365627)
    #0 pthread_mutex_lock <null> (bitcoind+0xa8cf6)
    #1 std::__1::mutex::lock() <null> (libc++.so.1+0x83505)
    #2 CConnman::ThreadSocketHandler() /home/hebasto/code/gh/bitcoin/src/net.cpp:1542:9 (bitcoind+0x1780d4)
    #3 decltype(*(std::__1::forward<CConnman*&>(fp0)).*fp()) std::__1::__invoke<void (CConnman::*&)(), CConnman*&, void>(void (CConnman::*&)(), CConnman*&) /usr/lib/llvm-10/bin/../include/c++/v1/type_traits:3480:1 (bitcoind+0x1a325a)
    #4 std::__1::__bind_return<void (CConnman::*)(), std::__1::tuple<CConnman*>, std::__1::tuple<>, __is_valid_bind_return<void (CConnman::*)(), std::__1::tuple<CConnman*>, std::__1::tuple<> >::value>::type std::__1::__apply_functor<void (CConnman::*)(), std::__1::tuple<CConnman*>, 0ul, std::__1::tuple<> >(void (CConnman::*&)(), std::__1::tuple<CConnman*>&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&) /usr/lib/llvm-10/bin/../include/c++/v1/functional:2770:12 (bitcoind+0x1a325a)
    #5 std::__1::__bind_return<void (CConnman::*)(), std::__1::tuple<CConnman*>, std::__1::tuple<>, __is_valid_bind_return<void (CConnman::*)(), std::__1::tuple<CConnman*>, std::__1::tuple<> >::value>::type std::__1::__bind<void (CConnman::*)(), CConnman*>::operator()<>() /usr/lib/llvm-10/bin/../include/c++/v1/functional:2803:20 (bitcoind+0x1a325a)
    #6 decltype(std::__1::forward<std::__1::__bind<void (CConnman::*)(), CConnman*>&>(fp)()) std::__1::__invoke<std::__1::__bind<void (CConnman::*)(), CConnman*>&>(std::__1::__bind<void (CConnman::*)(), CConnman*>&) /usr/lib/llvm-10/bin/../include/c++/v1/type_traits:3539:1 (bitcoind+0x1a325a)
    #7 void std::__1::__invoke_void_return_wrapper<void>::__call<std::__1::__bind<void (CConnman::*)(), CConnman*>&>(std::__1::__bind<void (CConnman::*)(), CConnman*>&) /usr/lib/llvm-10/bin/../include/c++/v1/__functional_base:348:9 (bitcoind+0x1a325a)
    #8 std::__1::__function::__alloc_func<std::__1::__bind<void (CConnman::*)(), CConnman*>, std::__1::allocator<std::__1::__bind<void (CConnman::*)(), CConnman*> >, void ()>::operator()() /usr/lib/llvm-10/bin/../include/c++/v1/functional:1540:16 (bitcoind+0x1a325a)
    #9 std::__1::__function::__func<std::__1::__bind<void (CConnman::*)(), CConnman*>, std::__1::allocator<std::__1::__bind<void (CConnman::*)(), CConnman*> >, void ()>::operator()() /usr/lib/llvm-10/bin/../include/c++/v1/functional:1714:12 (bitcoind+0x1a325a)
    #10 std::__1::__function::__value_func<void ()>::operator()() const /usr/lib/llvm-10/bin/../include/c++/v1/functional:1867:16 (bitcoind+0x148cfe)
    #11 std::__1::function<void ()>::operator()() const /usr/lib/llvm-10/bin/../include/c++/v1/functional:2473:12 (bitcoind+0x148cfe)
    #12 void TraceThread<std::__1::function<void ()> >(char const*, std::__1::function<void ()>) /home/hebasto/code/gh/bitcoin/src/./util/system.h:438:9 (bitcoind+0x148cfe)
    #13 decltype(std::__1::forward<void (*)(char const*, std::__1::function<void ()>)>(fp)(std::__1::forward<char const*>(fp0), std::__1::forward<std::__1::function<void ()> >(fp0))) std::__1::__invoke<void (*)(char const*, std::__1::function<void ()>), char const*, std::__1::function<void ()> >(void (*&&)(char const*, std::__1::function<void ()>), char const*&&, std::__1::function<void ()>&&) /usr/lib/llvm-10/bin/../include/c++/v1/type_traits:3539:1 (bitcoind+0x1a342c)
    #14 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(char const*, std::__1::function<void ()>), char const*, std::__1::function<void ()>, 2ul, 3ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(char const*, std::__1::function<void ()>), char const*, std::__1::function<void ()> >&, std::__1::__tuple_indices<2ul, 3ul>) /usr/lib/llvm-10/bin/../include/c++/v1/thread:273:5 (bitcoind+0x1a342c)
    #15 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(char const*, std::__1::function<void ()>), char const*, std::__1::function<void ()> > >(void*) /usr/lib/llvm-10/bin/../include/c++/v1/thread:284:5 (bitcoind+0x1a342c)

  Location is heap block of size 328920 at 0x7f3445476000 allocated by main thread:
    #0 operator new(unsigned long) <null> (bitcoind+0x11883b)
    #1 std::__1::__unique_if<CConnman>::__unique_single std::__1::make_unique<CConnman, unsigned long, unsigned long, bool>(unsigned long&&, unsigned long&&, bool&&) /usr/lib/llvm-10/bin/../include/c++/v1/memory:3028:28 (bitcoind+0x134039)
    #2 std::__1::unique_ptr<CConnman, std::__1::default_delete<CConnman> > MakeUnique<CConnman, unsigned long, unsigned long, bool>(unsigned long&&, unsigned long&&, bool&&) /home/hebasto/code/gh/bitcoin/src/./util/memory.h:17:12 (bitcoind+0x134039)
    #3 AppInitMain(util::Ref const&, NodeContext&, interfaces::BlockAndHeaderTipInfo*) /home/hebasto/code/gh/bitcoin/src/init.cpp:1386:20 (bitcoind+0x134039)
    #4 AppInit(int, char**) /home/hebasto/code/gh/bitcoin/src/bitcoind.cpp:133:43 (bitcoind+0x11b773)
    #5 main /home/hebasto/code/gh/bitcoin/src/bitcoind.cpp:161:13 (bitcoind+0x11b773)

  Mutex M132842 (0x7f34454c63f8) created at:
    #0 pthread_mutex_lock <null> (bitcoind+0xa8cf6)
    #1 std::__1::mutex::lock() <null> (libc++.so.1+0x83505)
    #2 AppInitMain(util::Ref const&, NodeContext&, interfaces::BlockAndHeaderTipInfo*) /home/hebasto/code/gh/bitcoin/src/init.cpp:1981:24 (bitcoind+0x13d0a3)
    #3 AppInit(int, char**) /home/hebasto/code/gh/bitcoin/src/bitcoind.cpp:133:43 (bitcoind+0x11b773)
    #4 main /home/hebasto/code/gh/bitcoin/src/bitcoind.cpp:161:13 (bitcoind+0x11b773)

SUMMARY: ThreadSanitizer: double lock of a mutex (/home/hebasto/code/gh/bitcoin/src/bitcoind+0xa8cf6) in pthread_mutex_lock
================== != 
[node 0] Cleaning up leftover process

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to update this PR after more tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hebasto
Copy link
Member Author

hebasto commented Dec 22, 2020

Updated 354d2cb -> 3e15712 (pr19983.11 -> pr19983.12):

deadlock:UpdateTip

# WalletBatch (unidentified deadlock)
deadlock:WalletBatch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the steps to verify this patch? Note that the deadlocks are intermittent, so there is no way to run the tests and reproduce them or not reproduce them reliably.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TSan searches for potential deadlocks, i.e., it verifies consistency of lock ordering in the code, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock order graph is generated at runtime and is not deterministic. It depends on the code paths that are executed, and our tests exercise different code paths in each run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the steps to verify this patch?

I don't know (besides tracking code changes since these suppressions were introduced).

Having a smaller set of suppressions is better, though. If this patch appears wrong in the next year, it could be reverted :)

@maflcko
Copy link
Member

maflcko commented Dec 22, 2020

If this patch appears wrong in the next year, it could be reverted :)

Ok, fair enough. Let's give it a try.

@maflcko maflcko merged commit 98de9eb into bitcoin:master Dec 22, 2020
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 23, 2020
3e15712 Update TSan suppressions (Hennadii Stepanov)

Pull request description:

  It seems possible now to drop some TSan suppressions.

Top commit has no ACKs.

Tree-SHA512: 94518fd2f3a7168b2989424de0696e42c8f509b833aafbc7e75f4c1180a0b8d9a47f43c50d06b03b26a924643afe86274b2062c9d456c17a68576d19566ed66f
@hebasto hebasto deleted the 200920-supp branch December 23, 2020 08:08
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants