-
Notifications
You must be signed in to change notification settings - Fork 37.7k
refactor: Replace boost::mutex,condition_var,chrono with std equivalents in scheduler #18234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @theuni. I know you have been working on similar refactors recently. |
ff3821b
to
42a5326
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
}; | ||
|
||
#define REVERSE_LOCK(g) decltype(g)::reverse_lock PASTE2(revlock, __COUNTER__)(g, #g, __FILE__, __LINE__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending comments on my post here: #18088 (comment)
Maybe replace it with PASTE2(revlock,PASTE2(__FILE__,__LINE__))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already use __COUNTER__
for LOCK()
in sync.h, so reusing it here seems fine. Using __FILE__
seems unlikely to generate a usable symbol name too :)
Concept ACK Thanks for working on de-boosting! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 42a5326 I reviewed the changes and surrounding code in detail, built/ran tests for each commit, ran bitcoind for 2 days (debian), and shut down/restarted bitcoind several times. This looks good.
@@ -70,9 +70,6 @@ BOOST_FIXTURE_TEST_CASE(txindex_initial_sync, TestChain100Setup) | |||
// shutdown sequence (c.f. Shutdown() in init.cpp) | |||
txindex.Stop(); | |||
|
|||
threadGroup.interrupt_all(); | |||
threadGroup.join_all(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 69460c9 could these be also removed in src/test/transaction_tests.cpp
L::457-458 if they are handled with TestingSetup::~TestingSetup()
; the test compiles and runs without them.
--- a/src/test/transaction_tests.cpp
+++ b/src/test/transaction_tests.cpp
@@ -453,9 +453,6 @@ BOOST_AUTO_TEST_CASE(test_big_witness_transaction)
bool controlCheck = control.Wait();
assert(controlCheck);
-
- threadGroup.interrupt_all();
- threadGroup.join_all();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this should be safe to remove. The scheduler needs to be stopped before this test ends, so the interrupt_all
should be replaced by a scheduler.stop()
. I think that otherwise you reintroduce bug #15410
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I was just now looking at src/test/checkqueue_tests.cpp which has similar code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonatack In transaction_tests, that's a locally declared threadGroup
, and it's a BasicTestingSetup
too, so the TestingSetup
destructor wouldn't execute, and even if it did, wouldn't clean up that threadGroup
.
@MarcoFalke Ah, you're quite right. Have undeleted the code and added a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajtowns right, makes sense. Thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit 306f71b:
After adding the scheduler.stop(), they are again redundant and confusing. I'd suggest to remove the threadGroup
calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling stop()
doesn't actually cause the thread to stop, you need to wait for it to complete. Could do that by looping and waiting for AreThreadsServicingQueue()
to be false, but that still leaves the possibility you continue on and destruct the object in between --nThreadsServicingQueue;
and newTaskScheduled.notify_one();
at the end of serviceQueue
. Could drop the interrupt_all
but if some other thread in the group needed an interrupt before being joined, that'd cause a hang too... So until we change something else, this doesn't seem redundant to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ups, correct.
Ok, I see that in the description. Re-reviewing. |
I would've expected this to require more work, but it looks like the scheduler was already surprisingly interruptible. Concept ACK. @TheBlueMatt might be aware of more scheduling minefields. |
@@ -207,6 +207,7 @@ void Shutdown(NodeContext& node) | |||
|
|||
// After everything has been shut down, but before things get flushed, stop the | |||
// CScheduler/checkqueue threadGroup | |||
if (node.scheduler) node.scheduler->stop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these guaranteed to fully execute in-order? Can an interruption-point hit while the scheduler is still tearing down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you're asking; node.scheduler->stop()
just sets a boolean to say "exit the loop" and notifies everything waiting on the condition var to prevent it staying asleep, all the tear down happens in the thread. The interruption could certainly happen while it's tearing down, stop(); interrupt_all()
happen in one thread, and only afterwards does the other thread even wake up. I don't think boost interrupt will have an effect anymore though, because serviceQueue doesn't use any boost primitives and boost interrupt only sets a flag which is checked by other boost things.
42a5326
to
9426759
Compare
Should add
to OP? |
@ajtowns Can you rebase on master now that #16117 is merged. Also, feel free to include a commit here to kill the last |
Calling interrupt_all() will immediately stop the scheduler, so it's safe to invoke stop() beforehand, and this removes the reliance on boost to interrupt serviceQueue().
Changes from boost::chrono to std::chrono, boost::condition_var to std::condition_var, boost::mutex to sync.h Mutex, and reverselock.h to sync.h REVERSE_LOCK. Also adds threadsafety annotations to CScheduler members.
9426759
to
70a6b52
Compare
Great work! I'm really glad this is happening, thanks for working on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 70a6b52, just style-nits 🌩
Show signature and timestamp
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
ACK 70a6b529f306ff72ea1badf25e970a92b2b17ab3, just style-nits 🌩
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUjbywwAhqXP0sWwiqNLLsI0h9fBTK3DQ7qRxX2cxV9RmUoiok2nKutVPGef2v4L
brJk/asu9hCJ5nb9/SMf6NRqnyhJ4lbTbt961Tu1au0LNfloCUfvfb8JxY42yDXD
zxzKYI/vQeUnaNBMINmZ9rOLkLMYqnxXb7fjejNL9l8txmE4jmRp53lvZj34IfDr
0Mhu3rq+FwLFgvwMWPEbPR1jG/AN2XaW0sHMMe1Xpw/intRxJLh3BSpPHZ9+AjIF
kWF6Dqp4Onf2dO4jmfaB5ipQI9IYFFZh3+jD3wePBD9kZuL2iqsfXvfblm+rXGZ4
XMOSXjJoNreRp/hH14gcGuKQrOOiraemT15rH4XiC/DUI6lLMa1FWMNGn1Pob5A0
EwjIUwnHVNbCqpJIUcGXLEHTZ0pE/1cQ6nBKEFoPcTEgk7D1gH5mX4wP3UBdPw57
Vka9bjoTAOLqdy7B2FE8qg+dNiVqaDm9eRxixS1uqDY+lWHfNkeubg9SaE6LkC//
5qT991xc
=GGMb
-----END PGP SIGNATURE-----
Timestamp of file with hash 8b7a37edb1d87523e3df70995e0c51ee4dd4dafc50f78c0b12195a66e49c4ce2 -
@@ -70,9 +70,6 @@ BOOST_FIXTURE_TEST_CASE(txindex_initial_sync, TestChain100Setup) | |||
// shutdown sequence (c.f. Shutdown() in init.cpp) | |||
txindex.Stop(); | |||
|
|||
threadGroup.interrupt_all(); | |||
threadGroup.join_all(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit 306f71b:
After adding the scheduler.stop(), they are again redundant and confusing. I'd suggest to remove the threadGroup
calls
@@ -155,6 +160,18 @@ void EnterCritical(const char* pszName, const char* pszFile, int nLine, void* cs | |||
push_lock(cs, CLockLocation(pszName, pszFile, nLine, fTry, util::ThreadGetInternalName())); | |||
} | |||
|
|||
void CheckLastCritical(void* cs, std::string& lockname, const char* guardname, const char* file, int line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit b9c4260:
Why is this not called GetLastLockName
, which better describes what this function does, imo.
|
||
UniqueLock& lock; | ||
UniqueLock templock; | ||
std::string lockname; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit b9c4260:
Should be m_lockname
. Also for other members?
reverse_lock& operator=(reverse_lock const&); | ||
|
||
UniqueLock& lock; | ||
UniqueLock templock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit b9c4260:
I think it would help readers of the code to briefly explain why this is needed. Also, m_templock
? ;)
@@ -7,11 +7,12 @@ | |||
|
|||
// | |||
// NOTE: | |||
// boost::thread / boost::chrono should be ported to std::thread / std::chrono | |||
// boost::thread should be ported to std::thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in commit d0ebd93:
Actually boost::thread
is no longer needed an can be replaced with std::thread
in this file and the scheduler tests. I know that the thread group is still boost, but the scheduler doesn't care about that.
Summary: Calling interrupt_all() will immediately stop the scheduler, so it's safe to invoke stop() beforehand, and this removes the reliance on boost to interrupt serviceQueue(). This is a partial abckport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@306f71b Test Plan: ninja all check-all Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D6531
Summary: This is a partial backport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@b9c4260 Depends on D6531 Test Plan: ninja all check-all Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D6532
Summary: Changes from boost::chrono to std::chrono, boost::condition_var to std::condition_var, boost::mutex to sync.h Mutex, and reverselock.h to sync.h REVERSE_LOCK. Also adds threadsafety annotations to CScheduler members. This is a partial backport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@d0ebd93 Depends on D6532 Test Plan: ninja all check-all Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D6533
Summary: This is a partial backport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@cea19f6 Depends on D6533 Test Plan: ninja all check-all Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D6535
…t's wait_until ed0223e scheduler: Workaround negative nsecs bug in boost's wait_until (Luke Dashjr) Pull request description: Some boost versions have a bug that can cause a time prior to system boot (or wake from sleep) to throw an exception instead of return timeout See boostorg/thread#308 NOTE: This was addressed in master with a refactor (#18234), so this isn't a strict backport and needs full review. Fixes #18227 Cleanly merges to 0.14+ ACKs for top commit: laanwj: ACK ed0223e gruve-p: ACK ed0223e Tree-SHA512: 57edd0a22d7cf8f04b427e23d1ba10746a492638021d4438781b9d313dd0459418f64f0489be72d8e2286bbc8e8762d77e673868c25eb3bf4f0423a8fe8cdffa
…th std equivalents in scheduler
…th std equivalents in scheduler
Summary: > After [[bitcoin/bitcoin#18234 | core#18234]] the scheduler itself no longer cares if the > serviceQueue is run in a std::thread or boost::thread. Change the > documentation to std::thread because we switched to C++11. This is a backport of [[bitcoin/bitcoin#19090 | core#19090]] [3/6] bitcoin/bitcoin@fab2950 Test Plan: Proofreading Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D9957
… std equivalents in scheduler b2ac70f [Cleanup] Drop unused reverselock.h (random-zebra) 55ac6de scheduler: switch from boost to std (Anthony Towns) 15f292b sync.h: add REVERSE_LOCK (Anthony Towns) a8705a1 scheduler: don't rely on boost interrupt on shutdown (random-zebra) 004c064 util: Add UnintrruptibleSleep (MarcoFalke) Pull request description: Backports bitcoin#18234 [Anthony Towns] > Motivated by 18227, but should stand alone. Changing from boost::condition_var to std::condition_var means threadGroup.interrupt_all isn't enough to interrupt serviceQueue anymore, so that means calling stop() before join_all() is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from g_lockstack which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour. Fixes 16027, Fixes 14200, Fixes 18227 First commit coming from bitcoin#16117 This should fix #2515 ACKs for top commit: furszy: code review ACK b2ac70f Fuzzbawls: utACK b2ac70f Tree-SHA512: 5a39f8047948d02f75bb3cce99b8a9ce8d31e30c2e12a8ebfe2a871ed8e0fbe552c6e9deaad5d3e9c0830e7286f2cb627f0605d5887c4407f91cc2f9d667e8b1
…_var,chrono with std equivalents in scheduler This backport does not include changes that depend on bitcoin pr 18037 70a6b52 lint-cppcheck: Remove -DHAVE_WORKING_BOOST_SLEEP_FOR (Anthony Towns) 294937b scheduler_tests: re-enable mockforward test (Anthony Towns) cea19f6 Drop unused reverselock.h (Anthony Towns) d0ebd93 scheduler: switch from boost to std (Anthony Towns) b9c4260 sync.h: add REVERSE_LOCK (Anthony Towns) 306f71b scheduler: don't rely on boost interrupt on shutdown (Anthony Towns) Pull request description: Replacing boost functionality with C++11 stuff. Motivated by bitcoin#18227, but should stand alone. Changing from `boost::condition_var` to `std::condition_var` means `threadGroup.interrupt_all` isn't enough to interrupt `serviceQueue` anymore, so that means calling `stop()` before `join_all()` is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from `g_lockstack` which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour. Fixes bitcoin#16027, Fixes bitcoin#14200, Fixes bitcoin#18227 ACKs for top commit: laanwj: ACK 70a6b52 Tree-SHA512: d1da13adeabcf9186d114e2dad9a4fdbe2e440f7afbccde0c13dfbaf464efcd850b69d3371c5bf8b179d7ceb9d81f4af3cc22960b90834e41eaaf6d52ef7d331 # Conflicts: # src/reverselock.h # src/rpc/misc.cpp # src/scheduler.cpp # src/scheduler.h # src/sync.cpp # src/sync.h # src/test/reverselock_tests.cpp # src/test/scheduler_tests.cpp # src/test/test_dash.cpp # test/lint/extended-lint-cppcheck.sh
…_var,chrono with std equivalents in scheduler This backport does not include changes that depend on bitcoin pr 18037 70a6b52 lint-cppcheck: Remove -DHAVE_WORKING_BOOST_SLEEP_FOR (Anthony Towns) 294937b scheduler_tests: re-enable mockforward test (Anthony Towns) cea19f6 Drop unused reverselock.h (Anthony Towns) d0ebd93 scheduler: switch from boost to std (Anthony Towns) b9c4260 sync.h: add REVERSE_LOCK (Anthony Towns) 306f71b scheduler: don't rely on boost interrupt on shutdown (Anthony Towns) Pull request description: Replacing boost functionality with C++11 stuff. Motivated by bitcoin#18227, but should stand alone. Changing from `boost::condition_var` to `std::condition_var` means `threadGroup.interrupt_all` isn't enough to interrupt `serviceQueue` anymore, so that means calling `stop()` before `join_all()` is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from `g_lockstack` which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour. Fixes bitcoin#16027, Fixes bitcoin#14200, Fixes bitcoin#18227 ACKs for top commit: laanwj: ACK 70a6b52 Tree-SHA512: d1da13adeabcf9186d114e2dad9a4fdbe2e440f7afbccde0c13dfbaf464efcd850b69d3371c5bf8b179d7ceb9d81f4af3cc22960b90834e41eaaf6d52ef7d331 # Conflicts: # src/reverselock.h # src/rpc/misc.cpp # src/scheduler.cpp # src/scheduler.h # src/sync.cpp # src/sync.h # src/test/reverselock_tests.cpp # src/test/scheduler_tests.cpp # src/test/test_dash.cpp # test/lint/extended-lint-cppcheck.sh
…_var,chrono with std equivalents in scheduler This backport does not include changes that depend on bitcoin pr 18037 70a6b52 lint-cppcheck: Remove -DHAVE_WORKING_BOOST_SLEEP_FOR (Anthony Towns) 294937b scheduler_tests: re-enable mockforward test (Anthony Towns) cea19f6 Drop unused reverselock.h (Anthony Towns) d0ebd93 scheduler: switch from boost to std (Anthony Towns) b9c4260 sync.h: add REVERSE_LOCK (Anthony Towns) 306f71b scheduler: don't rely on boost interrupt on shutdown (Anthony Towns) Pull request description: Replacing boost functionality with C++11 stuff. Motivated by bitcoin#18227, but should stand alone. Changing from `boost::condition_var` to `std::condition_var` means `threadGroup.interrupt_all` isn't enough to interrupt `serviceQueue` anymore, so that means calling `stop()` before `join_all()` is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from `g_lockstack` which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour. Fixes bitcoin#16027, Fixes bitcoin#14200, Fixes bitcoin#18227 ACKs for top commit: laanwj: ACK 70a6b52 Tree-SHA512: d1da13adeabcf9186d114e2dad9a4fdbe2e440f7afbccde0c13dfbaf464efcd850b69d3371c5bf8b179d7ceb9d81f4af3cc22960b90834e41eaaf6d52ef7d331 # Conflicts: # src/reverselock.h # src/rpc/misc.cpp # src/scheduler.cpp # src/scheduler.h # src/sync.cpp # src/sync.h # src/test/reverselock_tests.cpp # src/test/scheduler_tests.cpp # src/test/test_dash.cpp # test/lint/extended-lint-cppcheck.sh
Replacing boost functionality with C++11 stuff.
Motivated by #18227, but should stand alone. Changing from
boost::condition_var
tostd::condition_var
meansthreadGroup.interrupt_all
isn't enough to interruptserviceQueue
anymore, so that means callingstop()
beforejoin_all()
is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed fromg_lockstack
which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour.Fixes #16027, Fixes #14200, Fixes #18227