Skip to content

Conversation

hebasto
Copy link
Member

@hebasto hebasto commented Jun 22, 2020

In master (8ef15e8) the "previous" and "current" lock orders are mistakenly swapped.

This PR:

  • fixes printed lock orders
  • improves the sync_tests unit test
  • makes the "detected inconsistent lock order" error message pointing to the lock location rather tfm::format() location.

Debugger output example with this PR (with modified code, of course):

2020-06-22T15:46:56Z [msghand] POTENTIAL DEADLOCK DETECTED
2020-06-22T15:46:56Z [msghand] Previous lock order was:
2020-06-22T15:46:56Z [msghand]  (2) 'cs_main' in net_processing.cpp:2545 (in thread 'msghand')
2020-06-22T15:46:56Z [msghand]  (1) 'g_cs_orphans' in net_processing.cpp:1400 (in thread 'msghand')
2020-06-22T15:46:56Z [msghand] Current lock order is:
2020-06-22T15:46:56Z [msghand]  (1) 'g_cs_orphans' in net_processing.cpp:2816 (in thread 'msghand')
2020-06-22T15:46:56Z [msghand]  (2) 'cs_main' in net_processing.cpp:2816 (in thread 'msghand')
Assertion failed: detected inconsistent lock order for 'cs_main' in net_processing.cpp:2816 (in thread 'msghand'), details in debug log.
Process 131393 stopped
* thread #15, name = 'b-msghand', stop reason = signal SIGABRT
    frame #0: 0x00007ffff775c18b libc.so.6`__GI_raise(sig=2) at raise.c:51:1
(lldb) bt
* thread #15, name = 'b-msghand', stop reason = signal SIGABRT
  * frame #0: 0x00007ffff775c18b libc.so.6`__GI_raise(sig=2) at raise.c:51:1
    frame #1: 0x00007ffff773b859 libc.so.6`__GI_abort at abort.c:79:7
    frame #2: 0x0000555555e5b196 bitcoind`(anonymous namespace)::potential_deadlock_detected(mismatch=0x00007fff99ff6f30, s1=size=2, s2=size=2, lock_location=0x00007fff99ff7010) at sync.cpp:134:9
    frame #3: 0x0000555555e5a1b1 bitcoind`(anonymous namespace)::push_lock(c=0x0000555556379220, locklocation=0x00007fff99ff7010) at sync.cpp:158:13
    frame #4: 0x0000555555e59e8a bitcoind`EnterCritical(pszName="cs_main", pszFile="net_processing.cpp", nLine=2816, cs=0x0000555556379220, fTry=false) at sync.cpp:177:5
    frame #5: 0x00005555555b0500 bitcoind`UniqueLock<AnnotatedMixin<std::recursive_mutex>, std::unique_lock<std::recursive_mutex> >::Enter(this=0x00007fff99ff8c20, pszName="cs_main", pszFile="net_processing.cpp", nLine=2816) at sync.h:134:9
    frame #6: 0x00005555555b017f bitcoind`UniqueLock<AnnotatedMixin<std::recursive_mutex>, std::unique_lock<std::recursive_mutex> >::UniqueLock(this=0x00007fff99ff8c20, mutexIn=0x0000555556379220, pszName="cs_main", pszFile="net_processing.cpp", nLine=2816, fTry=false) at sync.h:160:13
    frame #7: 0x00005555556aa57e bitcoind`ProcessMessage(pfrom=0x00007fff90001180, msg_type=error: summary string parsing error, vRecv=0x00007fff9c005ac0, nTimeReceived=1592840815980751, chainparams=0x00005555564b7110, chainman=0x0000555556380880, mempool=0x0000555556380ae0, connman=0x000055555657aa20, banman=0x00005555565167b0, interruptMsgProc=0x00005555565cae90) at net_processing.cpp:2816:9

@laanwj
Copy link
Member

laanwj commented Jun 22, 2020

So to be clear: this fixes reporting of lock orders (with DEBUG_LOCKORDER), not the lock orders themselves?

@hebasto
Copy link
Member Author

hebasto commented Jun 22, 2020

@laanwj

So to be clear: this fixes reporting of lock orders (with DEBUG_LOCKORDER), not the lock orders themselves?

Yes. You are correct.

@laanwj laanwj added the Tests label Jun 22, 2020
@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 22, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@vasild
Copy link
Contributor

vasild commented Jun 25, 2020

ACK fdd1d77

I tested it with the following changes, then ran ./src/test/test_bitcoin --run_test="*/potential_deadlock_detected" and read its output.

lock mutexes on separate lines to make the output less ambiguous; print to console
diff --git i/src/test/sync_tests.cpp w/src/test/sync_tests.cpp
index 3ea8714f3..4d6b81c3e 100644
--- i/src/test/sync_tests.cpp
+++ w/src/test/sync_tests.cpp
@@ -9,17 +9,19 @@
 
 namespace {
 template <typename MutexType>
 void TestPotentialDeadLockDetected(MutexType& mutex1, MutexType& mutex2)
 {
     {
-        LOCK2(mutex1, mutex2);
+        LOCK(mutex1);
+        LOCK(mutex2);
     }
     bool error_thrown = false;
     try {
-        LOCK2(mutex2, mutex1);
+        LOCK(mutex2);
+        LOCK(mutex1);
     } catch (const std::logic_error& e) {
         BOOST_CHECK_EQUAL(e.what(), "potential deadlock detected: mutex1 -> mutex2 -> mutex1");
         error_thrown = true;
     }
     #ifdef DEBUG_LOCKORDER
     BOOST_CHECK(error_thrown);
@@ -32,13 +34,13 @@ void TestPotentialDeadLockDetected(MutexType& mutex1, MutexType& mutex2)
 BOOST_FIXTURE_TEST_SUITE(sync_tests, BasicTestingSetup)
 
 BOOST_AUTO_TEST_CASE(potential_deadlock_detected)
 {
     #ifdef DEBUG_LOCKORDER
     bool prev = g_debug_lockorder_abort;
-    g_debug_lockorder_abort = false;
+    g_debug_lockorder_abort = true;
     #endif
 
     RecursiveMutex rmutex1, rmutex2;
     TestPotentialDeadLockDetected(rmutex1, rmutex2);
 
     Mutex mutex1, mutex2;
diff --git i/src/test/util/setup_common.cpp w/src/test/util/setup_common.cpp
index 3b7a7c8d1..e80597d40 100644
--- i/src/test/util/setup_common.cpp
+++ w/src/test/util/setup_common.cpp
@@ -69,13 +69,13 @@ std::ostream& operator<<(std::ostream& os, const uint256& num)
 BasicTestingSetup::BasicTestingSetup(const std::string& chainName, const std::vector<const char*>& extra_args)
     : m_path_root{fs::temp_directory_path() / "test_common_" PACKAGE_NAME / g_insecure_rand_ctx_temp_path.rand256().ToString()}
 {
     const std::vector<const char*> arguments = Cat(
         {
             "dummy",
-            "-printtoconsole=0",
+            "-printtoconsole=1",
             "-logtimemicros",
             "-debug",
             "-debugexclude=libevent",
             "-debugexclude=leveldb",
         },
         extra_args);

Feel free to simplify as follows, there is no need to pass the lock location as an additional argument to potential_deadlock_detected() because it is always the last element in the second stack s2:

simplify
diff --git i/src/sync.cpp w/src/sync.cpp
index 260e9994d..b2a48fb0f 100644
--- i/src/sync.cpp
+++ w/src/sync.cpp
@@ -99,13 +99,13 @@ LockData& GetLockData() {
     // its subclasses must have implicitly-defined destructors.
     static LockData& lock_data = *new LockData();
     return lock_data;
 }
 
 namespace {
-void potential_deadlock_detected(const LockPair& mismatch, const LockStack& s1, const LockStack& s2, const CLockLocation& lock_location)
+void potential_deadlock_detected(const LockPair& mismatch, const LockStack& s1, const LockStack& s2)
 {
     LogPrintf("POTENTIAL DEADLOCK DETECTED\n");
     LogPrintf("Previous lock order was:\n");
     for (const LockStackItem& i : s1) {
         if (i.first == mismatch.first) {
             LogPrintf(" (1)"); /* Continued */
@@ -127,13 +127,13 @@ void potential_deadlock_detected(const LockPair& mismatch, const LockStack& s1,
             LogPrintf(" (2)"); /* Continued */
             mutex_b = i.second.Name();
         }
         LogPrintf(" %s\n", i.second.ToString());
     }
     if (g_debug_lockorder_abort) {
-        tfm::format(std::cerr, "Assertion failed: detected inconsistent lock order for %s, details in debug log.\n", lock_location.ToString());
+        tfm::format(std::cerr, "Assertion failed: detected inconsistent lock order for %s, details in debug log.\n", s2.back().second.ToString());
         abort();
     }
     throw std::logic_error(strprintf("potential deadlock detected: %s -> %s -> %s", mutex_b, mutex_a, mutex_b));
 }
 
 void push_lock(void* c, const CLockLocation& locklocation)
@@ -152,13 +152,13 @@ void push_lock(void* c, const CLockLocation& locklocation)
             continue;
         lockdata.lockorders.emplace(p1, lock_stack);
 
         const LockPair p2 = std::make_pair(c, i.first);
         lockdata.invlockorders.insert(p2);
         if (lockdata.lockorders.count(p2))
-            potential_deadlock_detected(p1, lockdata.lockorders[p2], lockdata.lockorders[p1], locklocation);
+            potential_deadlock_detected(p1, lockdata.lockorders[p2], lockdata.lockorders[p1]);
     }
 }
 
 void pop_lock()
 {
     LockData& lockdata = GetLockData();

@hebasto
Copy link
Member Author

hebasto commented Jun 25, 2020

Updated fdd1d77 -> 0ecff9d (pr19353.01 -> pr19353.02, diff):

Feel free to simplify as follows, there is no need to pass the lock location as an additional argument to potential_deadlock_detected() because it is always the last element in the second stack s2

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 0ecff9d

@hebasto
Copy link
Member Author

hebasto commented Jul 5, 2020

@fanquake Mind looking into this PR?

@laanwj
Copy link
Member

laanwj commented Jul 15, 2020

ACK 0ecff9d

@laanwj laanwj merged commit 31bdd86 into bitcoin:master Jul 15, 2020
@hebasto hebasto deleted the 200622-deadlock branch July 15, 2020 21:29
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 16, 2020
…lock orders

0ecff9d Improve "detected inconsistent lock order" error message (Hennadii Stepanov)
bbe9cf4 test: Improve "potential deadlock detected" exception message (Hennadii Stepanov)
3559934 Fix mistakenly swapped "previous" and "current" lock orders (Hennadii Stepanov)

Pull request description:

  In master (8ef15e8) the "previous" and "current" lock orders are mistakenly swapped.

  This PR:
  - fixes printed lock orders
  - improves the `sync_tests` unit test
  - makes the "detected inconsistent lock order" error message pointing to the lock location rather `tfm::format()` location.

  Debugger output example with this PR (with modified code, of course):
  ```
  2020-06-22T15:46:56Z [msghand] POTENTIAL DEADLOCK DETECTED
  2020-06-22T15:46:56Z [msghand] Previous lock order was:
  2020-06-22T15:46:56Z [msghand]  (2) 'cs_main' in net_processing.cpp:2545 (in thread 'msghand')
  2020-06-22T15:46:56Z [msghand]  (1) 'g_cs_orphans' in net_processing.cpp:1400 (in thread 'msghand')
  2020-06-22T15:46:56Z [msghand] Current lock order is:
  2020-06-22T15:46:56Z [msghand]  (1) 'g_cs_orphans' in net_processing.cpp:2816 (in thread 'msghand')
  2020-06-22T15:46:56Z [msghand]  (2) 'cs_main' in net_processing.cpp:2816 (in thread 'msghand')
  Assertion failed: detected inconsistent lock order for 'cs_main' in net_processing.cpp:2816 (in thread 'msghand'), details in debug log.
  Process 131393 stopped
  * thread #15, name = 'b-msghand', stop reason = signal SIGABRT
      frame #0: 0x00007ffff775c18b libc.so.6`__GI_raise(sig=2) at raise.c:51:1
  (lldb) bt
  * thread #15, name = 'b-msghand', stop reason = signal SIGABRT
    * frame #0: 0x00007ffff775c18b libc.so.6`__GI_raise(sig=2) at raise.c:51:1
      frame #1: 0x00007ffff773b859 libc.so.6`__GI_abort at abort.c:79:7
      frame #2: 0x0000555555e5b196 bitcoind`(anonymous namespace)::potential_deadlock_detected(mismatch=0x00007fff99ff6f30, s1=size=2, s2=size=2, lock_location=0x00007fff99ff7010) at sync.cpp:134:9
      frame #3: 0x0000555555e5a1b1 bitcoind`(anonymous namespace)::push_lock(c=0x0000555556379220, locklocation=0x00007fff99ff7010) at sync.cpp:158:13
      frame #4: 0x0000555555e59e8a bitcoind`EnterCritical(pszName="cs_main", pszFile="net_processing.cpp", nLine=2816, cs=0x0000555556379220, fTry=false) at sync.cpp:177:5
      frame #5: 0x00005555555b0500 bitcoind`UniqueLock<AnnotatedMixin<std::recursive_mutex>, std::unique_lock<std::recursive_mutex> >::Enter(this=0x00007fff99ff8c20, pszName="cs_main", pszFile="net_processing.cpp", nLine=2816) at sync.h:134:9
      frame #6: 0x00005555555b017f bitcoind`UniqueLock<AnnotatedMixin<std::recursive_mutex>, std::unique_lock<std::recursive_mutex> >::UniqueLock(this=0x00007fff99ff8c20, mutexIn=0x0000555556379220, pszName="cs_main", pszFile="net_processing.cpp", nLine=2816, fTry=false) at sync.h:160:13
      frame #7: 0x00005555556aa57e bitcoind`ProcessMessage(pfrom=0x00007fff90001180, msg_type=error: summary string parsing error, vRecv=0x00007fff9c005ac0, nTimeReceived=1592840815980751, chainparams=0x00005555564b7110, chainman=0x0000555556380880, mempool=0x0000555556380ae0, connman=0x000055555657aa20, banman=0x00005555565167b0, interruptMsgProc=0x00005555565cae90) at net_processing.cpp:2816:9
  ```

ACKs for top commit:
  laanwj:
    ACK 0ecff9d
  vasild:
    ACK 0ecff9d

Tree-SHA512: ff285de8dd3198b5b33c4bfbdadf9b1448189c96143b9696bc4f41c07e784c00851ec169cf3ed45cc325f3617ba6783620803234f57fcce28bf6bc3d6a7234fb
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jul 2, 2021
Summary:
> This PR:
>
>  - fixes printed lock orders
>  - improves the sync_tests unit test
>  - makes the "detected inconsistent lock order" error message pointing to the lock location rather tfm::format() location.

This is a backport of [[bitcoin/bitcoin#19353 | core#19353]]

Test Plan: `ninja all check-all`

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D9731
kwvg added a commit to kwvg/dash that referenced this pull request Oct 8, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Oct 12, 2021
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Oct 12, 2021
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Oct 13, 2021
pravblockc pushed a commit to pravblockc/dash that referenced this pull request Nov 18, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants