-
Notifications
You must be signed in to change notification settings - Fork 37.8k
sync: detect double lock from the same thread #19337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Concept ACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
template void EnterCritical(const char*, const char*, int, RecursiveMutex*, bool); | ||
template void EnterCritical(const char*, const char*, int, std::mutex*, bool); | ||
template void EnterCritical(const char*, const char*, int, std::recursive_mutex*, bool); | ||
template void EnterCritical(const char*, const char*, int, boost::mutex*, bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add dead code or code that is only used in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These instantiations are required even if tests do not exist. Without them it wouldn't link with errors like:
ld: error: undefined symbol: void EnterCritical<boost::mutex>(char const*, char const*, int, boost::mutex*, bool)
>>> referenced by checkqueue.h:184 (src/checkqueue.h:184)
>>> libbitcoin_server_a-validation.o:(CCheckQueueControl<CScriptCheck>::CCheckQueueControl(CCheckQueue<CScriptCheck>*)) in archive libbitcoin_server.a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is solved in #18710 :)
EDIT: I mean boost::mutex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when these two PRs meet and if that was the last usage of boost::mutex
, then this line can be removed: template void EnterCritical(const char*, const char*, int, boost::mutex*, bool);
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried compiling locally without it and it worked #19337 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested 401d3b7 with the following diff:
--- a/src/addrman.h
+++ b/src/addrman.h
@@ -178,7 +178,7 @@ class CAddrMan
friend class CAddrManTest;
protected:
//! critical section to protect the inner data structures
- mutable RecursiveMutex cs;
+ mutable Mutex cs;
private:
//! last used nId
$ ./src/qt/bitcoin-qt -regtest
Assertion failed: detected double lock at sync.cpp:153, details in debug log.
Aborted (core dumped)
Nice!
Some non-blocking nits:
- could place boost headers into a separate group
- in commit 401d3b7 message "... would produce an undefined behavior" -- mind rewording "would" -> "is"
- could use unnamed namespace instead of
static
It is reliably reproducible locally. |
@vasild UPDATE: at the end of the |
Fixed in #19340. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
Concept ACK |
Concept ACK Nice! |
401d3b7
to
d32d855
Compare
Done.
Done.
The rest of the functions in the file use Reviewers: this PR exposed an existent deficiency which is fixed in #19340. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I converted this PR to "draft" so that it does not get merged accidentally. CI tests passed this time! I guess the order in which the tests are executed matters. #19340 will fix that and should be merged before this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK, change LGTM.
The functions `EnterCritical()` and `push_lock()` take a pointer to a mutex, but that pointer used to be of type `void*` because we use a few different types for mutexes. This `void*` argument was not type safe because somebody could have send a pointer to anything that is not a mutex. Furthermore it wouldn't allow to check whether the passed mutex is recursive or not. Thus, change the functions to templated ones so that we can implement stricter checks for non-recursive mutexes. This also simplifies the callers of `EnterCritical()`.
d32d855
to
c269e0a
Compare
Reopened this PR now that #19340 is merged and rebased to resolve a conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK c269e0a, tested on Linux Mint 20 (x86_64).
lock_stack.pop_back(); | ||
if (g_debug_lockorder_abort) { | ||
tfm::format(std::cerr, "Assertion failed: detected double lock at %s:%i, details in debug log.\n", __FILE__, __LINE__); | ||
abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's more cases of s/abort()/std::abort so a follow up to to do this is also fine.
src/sync.cpp
Outdated
// at position `j` and at the end (which we added just before this loop). | ||
// Can't allow locking the same (non-recursive) mutex two times from the | ||
// same thread as that results in an undefined behavior. | ||
double_lock_detected(c, lock_stack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add assert(false)
after this? Or some other "unreachable" assertion. Below, after potential_deadlock_detected()
there's a comment, could also add the assertion there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment, like done for potential_deadlock_detected()
.
Addressed some review suggestions. |
b1f9407
to
f6a1a91
Compare
a diffdiff --git a/src/sync.cpp b/src/sync.cpp
index d020b4e334..55cc47191f 100644
--- a/src/sync.cpp
+++ b/src/sync.cpp
@@ -220,9 +220,6 @@ void EnterCritical(const char* pszName, const char* pszFile, int nLine, MutexTyp
}
template void EnterCritical(const char*, const char*, int, Mutex*, bool);
template void EnterCritical(const char*, const char*, int, RecursiveMutex*, bool);
-template void EnterCritical(const char*, const char*, int, std::mutex*, bool);
-template void EnterCritical(const char*, const char*, int, std::recursive_mutex*, bool);
-template void EnterCritical(const char*, const char*, int, boost::mutex*, bool);
void CheckLastCritical(void* cs, std::string& lockname, const char* guardname, const char* file, int line)
{
diff --git a/src/test/sync_tests.cpp b/src/test/sync_tests.cpp
index 2ecc78bbed..9a3c276ed2 100644
--- a/src/test/sync_tests.cpp
+++ b/src/test/sync_tests.cpp
@@ -33,37 +33,44 @@ void TestPotentialDeadLockDetected(MutexType& mutex1, MutexType& mutex2)
#endif
}
-#ifdef DEBUG_LOCKORDER
template <typename MutexType>
void TestDoubleLock2(MutexType& m)
{
- ENTER_CRITICAL_SECTION(m);
- LEAVE_CRITICAL_SECTION(m);
+ LOCK(m);
}
template <typename MutexType>
void TestDoubleLock(bool should_throw)
{
+#ifdef DEBUG_LOCKORDER
const bool prev = g_debug_lockorder_abort;
g_debug_lockorder_abort = false;
+#endif /* DEBUG_LOCKORDER */
MutexType m;
- ENTER_CRITICAL_SECTION(m);
- if (should_throw) {
- BOOST_CHECK_EXCEPTION(
- TestDoubleLock2(m), std::logic_error, [](const std::logic_error& e) {
- return strcmp(e.what(), "double lock detected") == 0;
- });
- } else {
- BOOST_CHECK_NO_THROW(TestDoubleLock2(m));
+ {
+ LOCK(m);
+ if (should_throw) {
+#ifdef DEBUG_LOCKORDER
+ /* Double lock would produce an undefined behavior. Thus, we only do that if
+ * DEBUG_LOCKORDER is activated to detect it. We don't want non-DEBUG_LOCKORDER
+ * build to produce tests that exhibit known undefined behavior. */
+ BOOST_CHECK_EXCEPTION(
+ TestDoubleLock2(m), std::logic_error, [](const std::logic_error& e) {
+ return strcmp(e.what(), "double lock detected") == 0;
+ });
+#endif /* DEBUG_LOCKORDER */
+ } else {
+ BOOST_CHECK_NO_THROW(TestDoubleLock2(m));
+ }
}
- LEAVE_CRITICAL_SECTION(m);
BOOST_CHECK(LockStackEmpty());
+#ifdef DEBUG_LOCKORDER
g_debug_lockorder_abort = prev;
-}
#endif /* DEBUG_LOCKORDER */
+}
} // namespace
BOOST_FIXTURE_TEST_SUITE(sync_tests, BasicTestingSetup)
@@ -90,34 +97,14 @@ BOOST_AUTO_TEST_CASE(potential_deadlock_detected)
#endif
}
-/* Double lock would produce an undefined behavior. Thus, we only do that if
- * DEBUG_LOCKORDER is activated to detect it. We don't want non-DEBUG_LOCKORDER
- * build to produce tests that exhibit known undefined behavior. */
-#ifdef DEBUG_LOCKORDER
BOOST_AUTO_TEST_CASE(double_lock_mutex)
{
TestDoubleLock<Mutex>(true /* should throw */);
}
-BOOST_AUTO_TEST_CASE(double_lock_std_mutex)
-{
- TestDoubleLock<std::mutex>(true /* should throw */);
-}
-
-BOOST_AUTO_TEST_CASE(double_lock_boost_mutex)
-{
- TestDoubleLock<boost::mutex>(true /* should throw */);
-}
-
BOOST_AUTO_TEST_CASE(double_lock_recursive_mutex)
{
TestDoubleLock<RecursiveMutex>(false /* should not throw */);
}
-BOOST_AUTO_TEST_CASE(double_lock_std_recursive_mutex)
-{
- TestDoubleLock<std::recursive_mutex>(false /* should not throw */);
-}
-#endif /* DEBUG_LOCKORDER */
-
BOOST_AUTO_TEST_SUITE_END() |
@MarcoFalke the above patch does not compile for me (neither with clang 11 nor with gcc 10) with an error like
I think you should also get it for typedef AnnotatedMixin<std::mutex> Mutex;
template <typename PARENT>
class LOCKABLE AnnotatedMixin : public PARENT
{
...
using UniqueLock = std::unique_lock<PARENT>; So when one defines
Then the constructor of the above calls the This is how we end up calling Thus we need the explicit instantiation |
I pushed the patch to my travis instance and it breaks the build, similarly to what I observe locally: https://travis-ci.org/github/vasild/bitcoin/jobs/715483824#L3800 |
Obviously I didn't compile with |
Double lock of the same (non-recursive) mutex from the same thread is producing an undefined behavior. Detect this from DEBUG_LOCKORDER and react similarly to the deadlock detection.
f6a1a91
to
95975dd
Compare
@MarcoFalke, I looked carefully at the diff above and took some of it but not all. Here are the reasons:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 95975dd
@ajtowns @ryanofsky @practicalswift Mind looking into this? |
Given how much of the current work on mutexes is converting recursive mutexes to non-resursive ones, it would be nice to get this in. |
code review ACK 95975dd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review ack
LogPrintf(" %s\n", i.second.ToString()); | ||
} | ||
if (g_debug_lockorder_abort) { | ||
tfm::format(std::cerr, "Assertion failed: detected double lock at %s:%i, details in debug log.\n", __FILE__, __LINE__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always blame the bug on sync.cpp itself:
detected double lock at sync.cpp:153
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right!
Addressed in #20507
// Can't allow locking the same (non-recursive) mutex two times from the | ||
// same thread as that results in an undefined behavior. | ||
auto lock_stack_copy = lock_stack; | ||
lock_stack.pop_back(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vasild why copy and pop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto lock_stack_copy = lock_stack;
lock_stack.pop_back();
double_lock_detected(c, lock_stack_copy);
That's subtle:
double_lock_detected()
canabort()
in which case it does not matter, but it can alsothrow
and tests catch this exception and continue working and executing other tests. So the lock stack needs to be correct after thethrow
because it is going to be reused by subsequent tests. Notice that we did not acquire the offending mutex that is at the top of the lock stack - we only detected a request to acquire it, which if executed would cause a double lock. So the topmost element needs to be removed before thethrow
.double_lock_detected()
needs the full lock stack, including the offending-but-not-actually-acquired-mutex because it prints some of its elements. It will want to print that last one.
After the merge of bitcoin#18710, the linter is warning: ```bash A new Boost dependency in the form of "boost/thread/mutex.hpp" appears to have been introduced: src/sync.cpp:#include <boost/thread/mutex.hpp> src/test/sync_tests.cpp:#include <boost/thread/mutex.hpp> ^---- failure generated from test/lint/lint-includes.sh ``` the interim bitcoin#19337 was merged, which introduced more `boost::mutex` usage. Given we no longer use `boost::mutex`, just remove the double lock test and remaining includes.
f827e15 refactor: remove straggling boost::mutex usage (fanquake) Pull request description: After the merge of bitcoin#18710, the linter is warning: ```bash A new Boost dependency in the form of "boost/thread/mutex.hpp" appears to have been introduced: src/sync.cpp:#include <boost/thread/mutex.hpp> src/test/sync_tests.cpp:#include <boost/thread/mutex.hpp> ^---- failure generated from test/lint/lint-includes.sh ``` bitcoin#18710 removed `boost/thread/mutex.hpp` from lint-includes, however in the interim bitcoin#19337 was merged, which introduced more `boost::mutex` usage. Given we no longer use `boost::mutex`, just remove the double lock test and remaining includes. ACKs for top commit: laanwj: Code review ACK f827e15 hebasto: ACK f827e15 Tree-SHA512: f738b12189fe5b39db3e8f8231e9002714413a962eaf98adc84a6614fa474df5616358cfb1c89b92a2b0564efa9b704a774c49d4a25dca18a0ccc3cd9eabfc0a
After the merge of bitcoin#18710, the linter is warning: ```bash A new Boost dependency in the form of "boost/thread/mutex.hpp" appears to have been introduced: src/sync.cpp:#include <boost/thread/mutex.hpp> src/test/sync_tests.cpp:#include <boost/thread/mutex.hpp> ^---- failure generated from test/lint/lint-includes.sh ``` the interim bitcoin#19337 was merged, which introduced more `boost::mutex` usage. Given we no longer use `boost::mutex`, just remove the double lock test and remaining includes.
Double lock of the same (non-recursive) mutex from the same thread would produce an undefined behavior. Detect this from
DEBUG_LOCKORDER
and react similarly to the deadlock detection.This came up during discussion in another, related PR: #19238 (comment).