Skip to content

Conversation

vasild
Copy link
Contributor

@vasild vasild commented Jun 20, 2020

Double lock of the same (non-recursive) mutex from the same thread would produce an undefined behavior. Detect this from DEBUG_LOCKORDER and react similarly to the deadlock detection.

This came up during discussion in another, related PR: #19238 (comment).

@hebasto
Copy link
Member

hebasto commented Jun 20, 2020

Concept ACK.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

template void EnterCritical(const char*, const char*, int, RecursiveMutex*, bool);
template void EnterCritical(const char*, const char*, int, std::mutex*, bool);
template void EnterCritical(const char*, const char*, int, std::recursive_mutex*, bool);
template void EnterCritical(const char*, const char*, int, boost::mutex*, bool);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to add dead code or code that is only used in tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These instantiations are required even if tests do not exist. Without them it wouldn't link with errors like:

ld: error: undefined symbol: void EnterCritical<boost::mutex>(char const*, char const*, int, boost::mutex*, bool)
>>> referenced by checkqueue.h:184 (src/checkqueue.h:184)
>>>               libbitcoin_server_a-validation.o:(CCheckQueueControl<CScriptCheck>::CCheckQueueControl(CCheckQueue<CScriptCheck>*)) in archive libbitcoin_server.a

Copy link
Member

@hebasto hebasto Jun 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is solved in #18710 :)

EDIT: I mean boost::mutex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when these two PRs meet and if that was the last usage of boost::mutex, then this line can be removed: template void EnterCritical(const char*, const char*, int, boost::mutex*, bool);.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried compiling locally without it and it worked #19337 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference - the two PRs met in master and afterwards that line was removed in #21010.

Thanks, @fanquake!

@hebasto
Copy link
Member

hebasto commented Jun 20, 2020

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested 401d3b7 with the following diff:

--- a/src/addrman.h
+++ b/src/addrman.h
@@ -178,7 +178,7 @@ class CAddrMan
 friend class CAddrManTest;
 protected:
     //! critical section to protect the inner data structures
-    mutable RecursiveMutex cs;
+    mutable Mutex cs;
 
 private:
     //! last used nId
$ ./src/qt/bitcoin-qt -regtest
Assertion failed: detected double lock at sync.cpp:153, details in debug log.
Aborted (core dumped)

Nice!

Some non-blocking nits:

  • could place boost headers into a separate group
  • in commit 401d3b7 message "... would produce an undefined behavior" -- mind rewording "would" -> "is"
  • could use unnamed namespace instead of static

@hebasto
Copy link
Member

hebasto commented Jun 20, 2020

Travis error:

It is reliably reproducible locally.

@hebasto
Copy link
Member

hebasto commented Jun 20, 2020

Travis error:

It is reliably reproducible locally.

@vasild
idk exactly why but moving the potential_deadlock_detected test case to the end fixes the problem.

UPDATE: at the end of the potential_deadlock_detected test case the lock stack is not empty.

@hebasto
Copy link
Member

hebasto commented Jun 20, 2020

Travis error:

* https://travis-ci.org/github/bitcoin/bitcoin/jobs/700351407#L4430

Fixed in #19340.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 20, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jnewbery
Copy link
Contributor

Concept ACK

@practicalswift
Copy link
Contributor

Concept ACK

Nice!

@vasild vasild force-pushed the detect_double_lock branch from 401d3b7 to d32d855 Compare June 22, 2020 10:11
@vasild
Copy link
Contributor Author

vasild commented Jun 22, 2020

* could place boost headers into a separate group

Done.

* in commit [401d3b7](https://github.com/bitcoin/bitcoin/commit/401d3b759e14a96332a033485257d2e993c2b944) message "... would produce an undefined behavior" -- mind rewording "would" -> "is"

Done.

* could use unnamed namespace instead of `static`

The rest of the functions in the file use static, so for consistency I also used static in the newly introduced function. Maybe replace all static with an unnamed namespace in sync.cpp? If yes, then that would go as a separate commit, or maybe even a separate PR.

Reviewers: this PR exposed an existent deficiency which is fixed in #19340.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d32d855

But #19340 should be considering for reviewing before this PR merging.

@vasild vasild marked this pull request as draft June 22, 2020 11:02
@vasild
Copy link
Contributor Author

vasild commented Jun 22, 2020

I converted this PR to "draft" so that it does not get merged accidentally. CI tests passed this time! I guess the order in which the tests are executed matters. #19340 will fix that and should be merged before this PR.

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, change LGTM.

@hebasto
Copy link
Member

hebasto commented Aug 4, 2020

@vasild

I converted this PR to "draft" so that it does not get merged accidentally. CI tests passed this time! I guess the order in which the tests are executed matters. #19340 will fix that and should be merged before this PR.

#19340 is merged :)

@vasild vasild marked this pull request as ready for review August 5, 2020 07:36
The functions `EnterCritical()` and `push_lock()` take a pointer to a
mutex, but that pointer used to be of type `void*` because we use a few
different types for mutexes. This `void*` argument was not type safe
because somebody could have send a pointer to anything that is not a
mutex. Furthermore it wouldn't allow to check whether the passed mutex
is recursive or not.

Thus, change the functions to templated ones so that we can implement
stricter checks for non-recursive mutexes. This also simplifies the
callers of `EnterCritical()`.
@vasild vasild force-pushed the detect_double_lock branch from d32d855 to c269e0a Compare August 5, 2020 07:49
@vasild
Copy link
Contributor Author

vasild commented Aug 5, 2020

Reopened this PR now that #19340 is merged and rebased to resolve a conflict.

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c269e0a, tested on Linux Mint 20 (x86_64).

lock_stack.pop_back();
if (g_debug_lockorder_abort) {
tfm::format(std::cerr, "Assertion failed: detected double lock at %s:%i, details in debug log.\n", __FILE__, __LINE__);
abort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more cases of s/abort()/std::abort so a follow up to to do this is also fine.

src/sync.cpp Outdated
// at position `j` and at the end (which we added just before this loop).
// Can't allow locking the same (non-recursive) mutex two times from the
// same thread as that results in an undefined behavior.
double_lock_detected(c, lock_stack);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add assert(false) after this? Or some other "unreachable" assertion. Below, after potential_deadlock_detected() there's a comment, could also add the assertion there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment, like done for potential_deadlock_detected().

@vasild
Copy link
Contributor Author

vasild commented Aug 5, 2020

Addressed some review suggestions.

@vasild vasild force-pushed the detect_double_lock branch from b1f9407 to f6a1a91 Compare August 5, 2020 13:01
@maflcko
Copy link
Member

maflcko commented Aug 5, 2020

a diff
diff --git a/src/sync.cpp b/src/sync.cpp
index d020b4e334..55cc47191f 100644
--- a/src/sync.cpp
+++ b/src/sync.cpp
@@ -220,9 +220,6 @@ void EnterCritical(const char* pszName, const char* pszFile, int nLine, MutexTyp
 }
 template void EnterCritical(const char*, const char*, int, Mutex*, bool);
 template void EnterCritical(const char*, const char*, int, RecursiveMutex*, bool);
-template void EnterCritical(const char*, const char*, int, std::mutex*, bool);
-template void EnterCritical(const char*, const char*, int, std::recursive_mutex*, bool);
-template void EnterCritical(const char*, const char*, int, boost::mutex*, bool);
 
 void CheckLastCritical(void* cs, std::string& lockname, const char* guardname, const char* file, int line)
 {
diff --git a/src/test/sync_tests.cpp b/src/test/sync_tests.cpp
index 2ecc78bbed..9a3c276ed2 100644
--- a/src/test/sync_tests.cpp
+++ b/src/test/sync_tests.cpp
@@ -33,37 +33,44 @@ void TestPotentialDeadLockDetected(MutexType& mutex1, MutexType& mutex2)
     #endif
 }
 
-#ifdef DEBUG_LOCKORDER
 template <typename MutexType>
 void TestDoubleLock2(MutexType& m)
 {
-    ENTER_CRITICAL_SECTION(m);
-    LEAVE_CRITICAL_SECTION(m);
+    LOCK(m);
 }
 
 template <typename MutexType>
 void TestDoubleLock(bool should_throw)
 {
+#ifdef DEBUG_LOCKORDER
     const bool prev = g_debug_lockorder_abort;
     g_debug_lockorder_abort = false;
+#endif /* DEBUG_LOCKORDER */
 
     MutexType m;
-    ENTER_CRITICAL_SECTION(m);
-    if (should_throw) {
-        BOOST_CHECK_EXCEPTION(
-            TestDoubleLock2(m), std::logic_error, [](const std::logic_error& e) {
-                return strcmp(e.what(), "double lock detected") == 0;
-            });
-    } else {
-        BOOST_CHECK_NO_THROW(TestDoubleLock2(m));
+    {
+        LOCK(m);
+        if (should_throw) {
+#ifdef DEBUG_LOCKORDER
+            /* Double lock would produce an undefined behavior. Thus, we only do that if
+         * DEBUG_LOCKORDER is activated to detect it. We don't want non-DEBUG_LOCKORDER
+         * build to produce tests that exhibit known undefined behavior. */
+            BOOST_CHECK_EXCEPTION(
+                TestDoubleLock2(m), std::logic_error, [](const std::logic_error& e) {
+                    return strcmp(e.what(), "double lock detected") == 0;
+                });
+#endif /* DEBUG_LOCKORDER */
+        } else {
+            BOOST_CHECK_NO_THROW(TestDoubleLock2(m));
+        }
     }
-    LEAVE_CRITICAL_SECTION(m);
 
     BOOST_CHECK(LockStackEmpty());
 
+#ifdef DEBUG_LOCKORDER
     g_debug_lockorder_abort = prev;
-}
 #endif /* DEBUG_LOCKORDER */
+}
 } // namespace
 
 BOOST_FIXTURE_TEST_SUITE(sync_tests, BasicTestingSetup)
@@ -90,34 +97,14 @@ BOOST_AUTO_TEST_CASE(potential_deadlock_detected)
     #endif
 }
 
-/* Double lock would produce an undefined behavior. Thus, we only do that if
- * DEBUG_LOCKORDER is activated to detect it. We don't want non-DEBUG_LOCKORDER
- * build to produce tests that exhibit known undefined behavior. */
-#ifdef DEBUG_LOCKORDER
 BOOST_AUTO_TEST_CASE(double_lock_mutex)
 {
     TestDoubleLock<Mutex>(true /* should throw */);
 }
 
-BOOST_AUTO_TEST_CASE(double_lock_std_mutex)
-{
-    TestDoubleLock<std::mutex>(true /* should throw */);
-}
-
-BOOST_AUTO_TEST_CASE(double_lock_boost_mutex)
-{
-    TestDoubleLock<boost::mutex>(true /* should throw */);
-}
-
 BOOST_AUTO_TEST_CASE(double_lock_recursive_mutex)
 {
     TestDoubleLock<RecursiveMutex>(false /* should not throw */);
 }
 
-BOOST_AUTO_TEST_CASE(double_lock_std_recursive_mutex)
-{
-    TestDoubleLock<std::recursive_mutex>(false /* should not throw */);
-}
-#endif /* DEBUG_LOCKORDER */
-
 BOOST_AUTO_TEST_SUITE_END()

@vasild
Copy link
Contributor Author

vasild commented Aug 6, 2020

@MarcoFalke the above patch does not compile for me (neither with clang 11 nor with gcc 10) with an error like

ld: error: undefined symbol: void EnterCritical<std::mutex>(char const*, char const*, int, std::mutex*, bool)

I think you should also get it for ./configure --enable-debug when the explicit instantiation of EnterCritical<std::mutex>() is removed. Here is how we end up needing that instantiation even though we never use std::mutex directly:

typedef AnnotatedMixin<std::mutex> Mutex;

template <typename PARENT>
class LOCKABLE AnnotatedMixin : public PARENT
{
...
    using UniqueLock = std::unique_lock<PARENT>;

So when one defines Mutex m; this ends up being AnnotatedMixin<std::mutex> m; and AnnotatedMixin<std::mutex>::UniqueLock is std::unique_lock<std::mutex>.

LOCK(m) expands to
DebugLock<AnnotatedMixin<std::mutex>> criticalblock123(m, ...) which expands to
UniqueLock<AnnotatedMixin<std::mutex>> criticalblock123(m, ...) which expands to (with default 2nd template arg):
UniqueLock<AnnotatedMixin<std::mutex>, AnnotatedMixin<std::mutex>::UniqueLock> criticalblock123(m, ...) which expands to
UniqueLock<AnnotatedMixin<std::mutex>, std::unique_lock<std::mutex>> criticalblock123(m, ...).

Then the constructor of the above calls the Enter() method which calls
EnterCritical(name, file, line, Base::mutex()) which expands to
EnterCritical(name, file, line, std::unique_lock<std::mutex>::mutex()).
The mutex() method (4th argument) returns a pointer to std::mutex.

This is how we end up calling EnterCritical() with a 4th argument of type pointer to std::mutex.

Thus we need the explicit instantiation template void EnterCritical(const char*, const char*, int, std::mutex*, bool);. Same applies to std::recursive_mutex and boost::mutex.

vasild added a commit to vasild/bitcoin that referenced this pull request Aug 6, 2020
@vasild
Copy link
Contributor Author

vasild commented Aug 6, 2020

I pushed the patch to my travis instance and it breaks the build, similarly to what I observe locally: https://travis-ci.org/github/vasild/bitcoin/jobs/715483824#L3800

@maflcko
Copy link
Member

maflcko commented Aug 7, 2020

Obviously I didn't compile with --enable-debug (shame!). Please disregard the sync.cpp patch, but the other hunks should compile?

Double lock of the same (non-recursive) mutex from the same thread
is producing an undefined behavior. Detect this from DEBUG_LOCKORDER
and react similarly to the deadlock detection.
@vasild vasild force-pushed the detect_double_lock branch from f6a1a91 to 95975dd Compare August 10, 2020 16:43
@vasild
Copy link
Contributor Author

vasild commented Aug 10, 2020

@MarcoFalke, I looked carefully at the diff above and took some of it but not all. Here are the reasons:

  • Indeed we never do LOCK() or ENTER_CRITICAL_SECTION() on std::mutex or std::recursive_mutex directly, so no need to test for that. Thus, I removed the tests double_lock_std_mutex and double_lock_std_recursive_mutex.

  • However we do call ENTER_CRITICAL_SECTION() on boost::mutex. So, I left the test double_lock_boost_mutex. So, TestDoubleLock2() cannot use LOCK(m) because it does not compile.

  • I do not think it makes sense to do these tests for non-DEBUG_LOCKORDER build because they will decay to:

    • if should_throw == true
    mutex lock
    if (should_throw) {
        // nothing
    }
    mutex unlock
    • if should_throw == false
    recursive mutex lock
    recursive mutex lock
    recursive mutex unlock
    recursive mutex unlock

    (with none of the code from sync.cpp exercised)

Copy link
Member

@hebasto hebasto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-ACK 95975dd

@hebasto
Copy link
Member

hebasto commented Aug 15, 2020

@ajtowns @ryanofsky @practicalswift Mind looking into this?

@laanwj
Copy link
Member

laanwj commented Nov 23, 2020

Given how much of the current work on mutexes is converting recursive mutexes to non-resursive ones, it would be nice to get this in.

@laanwj
Copy link
Member

laanwj commented Nov 25, 2020

code review ACK 95975dd

@laanwj laanwj merged commit 5009159 into bitcoin:master Nov 25, 2020
@vasild vasild deleted the detect_double_lock branch November 25, 2020 16:04
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Nov 25, 2020
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review ack

LogPrintf(" %s\n", i.second.ToString());
}
if (g_debug_lockorder_abort) {
tfm::format(std::cerr, "Assertion failed: detected double lock at %s:%i, details in debug log.\n", __FILE__, __LINE__);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always blame the bug on sync.cpp itself:

detected double lock at sync.cpp:153

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!

Addressed in #20507

// Can't allow locking the same (non-recursive) mutex two times from the
// same thread as that results in an undefined behavior.
auto lock_stack_copy = lock_stack;
lock_stack.pop_back();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasild why copy and pop here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

            auto lock_stack_copy = lock_stack;
            lock_stack.pop_back();
            double_lock_detected(c, lock_stack_copy);

That's subtle:

  1. double_lock_detected() can abort() in which case it does not matter, but it can also throw and tests catch this exception and continue working and executing other tests. So the lock stack needs to be correct after the throw because it is going to be reused by subsequent tests. Notice that we did not acquire the offending mutex that is at the top of the lock stack - we only detected a request to acquire it, which if executed would cause a double lock. So the topmost element needs to be removed before the throw.
  2. double_lock_detected() needs the full lock stack, including the offending-but-not-actually-acquired-mutex because it prints some of its elements. It will want to print that last one.

fanquake added a commit to fanquake/bitcoin that referenced this pull request Jan 26, 2021
After the merge of bitcoin#18710, the linter is warning:
```bash
A new Boost dependency in the form of "boost/thread/mutex.hpp" appears to have been introduced:
src/sync.cpp:#include <boost/thread/mutex.hpp>
src/test/sync_tests.cpp:#include <boost/thread/mutex.hpp>

^---- failure generated from test/lint/lint-includes.sh
```

the interim bitcoin#19337 was merged, which introduced more `boost::mutex` usage.

Given we no longer use `boost::mutex`, just remove the double lock test
and remaining includes.
maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Jan 26, 2021
f827e15 refactor: remove straggling boost::mutex usage (fanquake)

Pull request description:

  After the merge of bitcoin#18710, the linter is warning:
  ```bash
  A new Boost dependency in the form of "boost/thread/mutex.hpp" appears to have been introduced:
  src/sync.cpp:#include <boost/thread/mutex.hpp>
  src/test/sync_tests.cpp:#include <boost/thread/mutex.hpp>

  ^---- failure generated from test/lint/lint-includes.sh
  ```

  bitcoin#18710 removed `boost/thread/mutex.hpp` from lint-includes, however in the interim bitcoin#19337 was merged, which introduced more `boost::mutex` usage.

  Given we no longer use `boost::mutex`, just remove the double lock test and remaining includes.

ACKs for top commit:
  laanwj:
    Code review ACK f827e15
  hebasto:
    ACK f827e15

Tree-SHA512: f738b12189fe5b39db3e8f8231e9002714413a962eaf98adc84a6614fa474df5616358cfb1c89b92a2b0564efa9b704a774c49d4a25dca18a0ccc3cd9eabfc0a
remyers pushed a commit to remyers/bitcoin that referenced this pull request Jan 26, 2021
After the merge of bitcoin#18710, the linter is warning:
```bash
A new Boost dependency in the form of "boost/thread/mutex.hpp" appears to have been introduced:
src/sync.cpp:#include <boost/thread/mutex.hpp>
src/test/sync_tests.cpp:#include <boost/thread/mutex.hpp>

^---- failure generated from test/lint/lint-includes.sh
```

the interim bitcoin#19337 was merged, which introduced more `boost::mutex` usage.

Given we no longer use `boost::mutex`, just remove the double lock test
and remaining includes.
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants