Skip to content

Conversation

ajtowns
Copy link
Contributor

@ajtowns ajtowns commented Mar 1, 2020

Replacing boost functionality with C++11 stuff.

Motivated by #18227, but should stand alone. Changing from boost::condition_var to std::condition_var means threadGroup.interrupt_all isn't enough to interrupt serviceQueue anymore, so that means calling stop() before join_all() is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from g_lockstack which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour.

Fixes #16027, Fixes #14200, Fixes #18227

@fanquake
Copy link
Member

fanquake commented Mar 1, 2020

cc @theuni. I know you have been working on similar refactors recently.

@ajtowns ajtowns force-pushed the 202002-scheduler-deboost branch 2 times, most recently from ff3821b to 42a5326 Compare March 1, 2020 06:03
@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 1, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

};

#define REVERSE_LOCK(g) decltype(g)::reverse_lock PASTE2(revlock, __COUNTER__)(g, #g, __FILE__, __LINE__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending comments on my post here: #18088 (comment)
Maybe replace it with PASTE2(revlock,PASTE2(__FILE__,__LINE__))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use __COUNTER__ for LOCK() in sync.h, so reusing it here seems fine. Using __FILE__ seems unlikely to generate a usable symbol name too :)

@practicalswift
Copy link
Contributor

Concept ACK

Thanks for working on de-boosting! :)

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 42a5326 I reviewed the changes and surrounding code in detail, built/ran tests for each commit, ran bitcoind for 2 days (debian), and shut down/restarted bitcoind several times. This looks good.

@@ -70,9 +70,6 @@ BOOST_FIXTURE_TEST_CASE(txindex_initial_sync, TestChain100Setup)
// shutdown sequence (c.f. Shutdown() in init.cpp)
txindex.Stop();

threadGroup.interrupt_all();
threadGroup.join_all();

Copy link
Member

@jonatack jonatack Mar 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 69460c9 could these be also removed in src/test/transaction_tests.cpp L::457-458 if they are handled with TestingSetup::~TestingSetup(); the test compiles and runs without them.

--- a/src/test/transaction_tests.cpp
+++ b/src/test/transaction_tests.cpp
@@ -453,9 +453,6 @@ BOOST_AUTO_TEST_CASE(test_big_witness_transaction)

     bool controlCheck = control.Wait();
     assert(controlCheck);
-
-    threadGroup.interrupt_all();
-    threadGroup.join_all();
 }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this should be safe to remove. The scheduler needs to be stopped before this test ends, so the interrupt_all should be replaced by a scheduler.stop(). I think that otherwise you reintroduce bug #15410

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I was just now looking at src/test/checkqueue_tests.cpp which has similar code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonatack In transaction_tests, that's a locally declared threadGroup, and it's a BasicTestingSetup too, so the TestingSetup destructor wouldn't execute, and even if it did, wouldn't clean up that threadGroup.

@MarcoFalke Ah, you're quite right. Have undeleted the code and added a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajtowns right, makes sense. Thanks for the explanation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in commit 306f71b:

After adding the scheduler.stop(), they are again redundant and confusing. I'd suggest to remove the threadGroup calls

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling stop() doesn't actually cause the thread to stop, you need to wait for it to complete. Could do that by looping and waiting for AreThreadsServicingQueue() to be false, but that still leaves the possibility you continue on and destruct the object in between --nThreadsServicingQueue; and newTaskScheduled.notify_one(); at the end of serviceQueue. Could drop the interrupt_all but if some other thread in the group needed an interrupt before being joined, that'd cause a hang too... So until we change something else, this doesn't seem redundant to me?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, correct.

@theuni
Copy link
Member

theuni commented Mar 2, 2020

This doesn't seem to take into account that boost::condition_variable is interruptible, and can't simply be swapped out. Have I missed some prior work that makes this ok?

Ok, I see that in the description. Re-reviewing.

@theuni
Copy link
Member

theuni commented Mar 2, 2020

I would've expected this to require more work, but it looks like the scheduler was already surprisingly interruptible. Concept ACK.

@TheBlueMatt might be aware of more scheduling minefields.

@@ -207,6 +207,7 @@ void Shutdown(NodeContext& node)

// After everything has been shut down, but before things get flushed, stop the
// CScheduler/checkqueue threadGroup
if (node.scheduler) node.scheduler->stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these guaranteed to fully execute in-order? Can an interruption-point hit while the scheduler is still tearing down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you're asking; node.scheduler->stop() just sets a boolean to say "exit the loop" and notifies everything waiting on the condition var to prevent it staying asleep, all the tear down happens in the thread. The interruption could certainly happen while it's tearing down, stop(); interrupt_all() happen in one thread, and only afterwards does the other thread even wake up. I don't think boost interrupt will have an effect anymore though, because serviceQueue doesn't use any boost primitives and boost interrupt only sets a flag which is checked by other boost things.

@maflcko maflcko changed the title scheduler.cpp: Replace boost::mutex,condition_var,chrono with std equivalents refactor: Replace boost::mutex,condition_var,chrono with std equivalents in scheduler Mar 3, 2020
@ajtowns ajtowns force-pushed the 202002-scheduler-deboost branch from 42a5326 to 9426759 Compare March 4, 2020 04:03
@maflcko maflcko added this to the 0.20.0 milestone Mar 5, 2020
@maflcko
Copy link
Member

maflcko commented Mar 5, 2020

Should add

Fixes #16027, Fixes #14200, Fixes #18227

to OP?

@fanquake
Copy link
Member

fanquake commented Mar 6, 2020

@ajtowns Can you rebase on master now that #16117 is merged. Also, feel free to include a commit here to kill the last DHAVE_WORKING_BOOST_SLEEP_FOR.

ajtowns added 5 commits March 6, 2020 23:13
Calling interrupt_all() will immediately stop the scheduler, so it's
safe to invoke stop() beforehand, and this removes the reliance on boost
to interrupt serviceQueue().
Changes from boost::chrono to std::chrono, boost::condition_var to
std::condition_var, boost::mutex to sync.h Mutex, and reverselock.h to
sync.h REVERSE_LOCK. Also adds threadsafety annotations to CScheduler
members.
@ajtowns ajtowns force-pushed the 202002-scheduler-deboost branch from 9426759 to 70a6b52 Compare March 6, 2020 18:21
@laanwj
Copy link
Member

laanwj commented Mar 6, 2020

Great work! I'm really glad this is happening, thanks for working on this.
ACK 70a6b52

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 70a6b52, just style-nits 🌩

Show signature and timestamp

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK 70a6b529f306ff72ea1badf25e970a92b2b17ab3, just style-nits 🌩
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUjbywwAhqXP0sWwiqNLLsI0h9fBTK3DQ7qRxX2cxV9RmUoiok2nKutVPGef2v4L
brJk/asu9hCJ5nb9/SMf6NRqnyhJ4lbTbt961Tu1au0LNfloCUfvfb8JxY42yDXD
zxzKYI/vQeUnaNBMINmZ9rOLkLMYqnxXb7fjejNL9l8txmE4jmRp53lvZj34IfDr
0Mhu3rq+FwLFgvwMWPEbPR1jG/AN2XaW0sHMMe1Xpw/intRxJLh3BSpPHZ9+AjIF
kWF6Dqp4Onf2dO4jmfaB5ipQI9IYFFZh3+jD3wePBD9kZuL2iqsfXvfblm+rXGZ4
XMOSXjJoNreRp/hH14gcGuKQrOOiraemT15rH4XiC/DUI6lLMa1FWMNGn1Pob5A0
EwjIUwnHVNbCqpJIUcGXLEHTZ0pE/1cQ6nBKEFoPcTEgk7D1gH5mX4wP3UBdPw57
Vka9bjoTAOLqdy7B2FE8qg+dNiVqaDm9eRxixS1uqDY+lWHfNkeubg9SaE6LkC//
5qT991xc
=GGMb
-----END PGP SIGNATURE-----

Timestamp of file with hash 8b7a37edb1d87523e3df70995e0c51ee4dd4dafc50f78c0b12195a66e49c4ce2 -

@@ -70,9 +70,6 @@ BOOST_FIXTURE_TEST_CASE(txindex_initial_sync, TestChain100Setup)
// shutdown sequence (c.f. Shutdown() in init.cpp)
txindex.Stop();

threadGroup.interrupt_all();
threadGroup.join_all();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in commit 306f71b:

After adding the scheduler.stop(), they are again redundant and confusing. I'd suggest to remove the threadGroup calls

@@ -155,6 +160,18 @@ void EnterCritical(const char* pszName, const char* pszFile, int nLine, void* cs
push_lock(cs, CLockLocation(pszName, pszFile, nLine, fTry, util::ThreadGetInternalName()));
}

void CheckLastCritical(void* cs, std::string& lockname, const char* guardname, const char* file, int line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in commit b9c4260:

Why is this not called GetLastLockName, which better describes what this function does, imo.


UniqueLock& lock;
UniqueLock templock;
std::string lockname;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in commit b9c4260:

Should be m_lockname. Also for other members?

reverse_lock& operator=(reverse_lock const&);

UniqueLock& lock;
UniqueLock templock;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in commit b9c4260:

I think it would help readers of the code to briefly explain why this is needed. Also, m_templock? ;)

@@ -7,11 +7,12 @@

//
// NOTE:
// boost::thread / boost::chrono should be ported to std::thread / std::chrono
// boost::thread should be ported to std::thread
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in commit d0ebd93:

Actually boost::thread is no longer needed an can be replaced with std::thread in this file and the scheduler tests. I know that the thread group is still boost, but the scheduler doesn't care about that.

deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jun 11, 2020
Summary:
Calling interrupt_all() will immediately stop the scheduler, so it's
safe to invoke stop() beforehand, and this removes the reliance on boost
to interrupt serviceQueue().

This is a partial abckport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@306f71b

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D6531
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jun 11, 2020
Summary:
This is a partial backport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@b9c4260

Depends on D6531

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D6532
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jun 11, 2020
Summary:
Changes from boost::chrono to std::chrono, boost::condition_var to
std::condition_var, boost::mutex to sync.h Mutex, and reverselock.h to
sync.h REVERSE_LOCK. Also adds threadsafety annotations to CScheduler
members.

This is a partial backport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@d0ebd93

Depends on D6532

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D6533
deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jun 11, 2020
Summary:
This is a partial backport of Core [[bitcoin/bitcoin#18234 | PR18234]] : bitcoin/bitcoin@cea19f6

Depends on D6533

Test Plan:
  ninja all check-all

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D6535
maflcko pushed a commit that referenced this pull request Aug 28, 2020
…t's wait_until

ed0223e scheduler: Workaround negative nsecs bug in boost's wait_until (Luke Dashjr)

Pull request description:

  Some boost versions have a bug that can cause a time prior to system boot (or wake from sleep) to throw an exception instead of return timeout
  See boostorg/thread#308

  NOTE: This was addressed in master with a refactor (#18234), so this isn't a strict backport and needs full review.

  Fixes #18227

  Cleanly merges to 0.14+

ACKs for top commit:
  laanwj:
    ACK ed0223e
  gruve-p:
    ACK ed0223e

Tree-SHA512: 57edd0a22d7cf8f04b427e23d1ba10746a492638021d4438781b9d313dd0459418f64f0489be72d8e2286bbc8e8762d77e673868c25eb3bf4f0423a8fe8cdffa
kwvg added a commit to kwvg/dash that referenced this pull request Jul 13, 2021
kwvg added a commit to kwvg/dash that referenced this pull request Jul 13, 2021
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Aug 27, 2021
Summary:
> After [[bitcoin/bitcoin#18234 | core#18234]] the scheduler itself no longer cares if the
> serviceQueue is run in a std::thread or boost::thread. Change the
> documentation to std::thread because we switched to C++11.

This is a backport of [[bitcoin/bitcoin#19090 | core#19090]] [3/6]
bitcoin/bitcoin@fab2950

Test Plan: Proofreading

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Differential Revision: https://reviews.bitcoinabc.org/D9957
furszy added a commit to PIVX-Project/PIVX that referenced this pull request Aug 27, 2021
… std equivalents in scheduler

b2ac70f [Cleanup] Drop unused reverselock.h (random-zebra)
55ac6de scheduler: switch from boost to std (Anthony Towns)
15f292b sync.h: add REVERSE_LOCK (Anthony Towns)
a8705a1 scheduler: don't rely on boost interrupt on shutdown (random-zebra)
004c064 util: Add UnintrruptibleSleep (MarcoFalke)

Pull request description:

  Backports bitcoin#18234 [Anthony Towns]

  > Motivated by 18227, but should stand alone. Changing from boost::condition_var to std::condition_var means threadGroup.interrupt_all isn't enough to interrupt serviceQueue anymore, so that means calling stop() before join_all() is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from g_lockstack which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour.
  Fixes 16027, Fixes 14200, Fixes 18227

  First commit coming from bitcoin#16117

  This should fix #2515

ACKs for top commit:
  furszy:
    code review ACK b2ac70f
  Fuzzbawls:
    utACK b2ac70f

Tree-SHA512: 5a39f8047948d02f75bb3cce99b8a9ce8d31e30c2e12a8ebfe2a871ed8e0fbe552c6e9deaad5d3e9c0830e7286f2cb627f0605d5887c4407f91cc2f9d667e8b1
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Sep 28, 2021
…_var,chrono with std equivalents in scheduler

This backport does not include changes that depend on bitcoin pr 18037

70a6b52 lint-cppcheck: Remove -DHAVE_WORKING_BOOST_SLEEP_FOR (Anthony Towns)
294937b scheduler_tests: re-enable mockforward test (Anthony Towns)
cea19f6 Drop unused reverselock.h (Anthony Towns)
d0ebd93 scheduler: switch from boost to std (Anthony Towns)
b9c4260 sync.h: add REVERSE_LOCK (Anthony Towns)
306f71b scheduler: don't rely on boost interrupt on shutdown (Anthony Towns)

Pull request description:

  Replacing boost functionality with C++11 stuff.

  Motivated by bitcoin#18227, but should stand alone. Changing from `boost::condition_var` to `std::condition_var` means `threadGroup.interrupt_all` isn't enough to interrupt `serviceQueue` anymore, so that means calling `stop()` before `join_all()` is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from `g_lockstack` which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour.

  Fixes bitcoin#16027, Fixes bitcoin#14200, Fixes bitcoin#18227

ACKs for top commit:
  laanwj:
    ACK 70a6b52

Tree-SHA512: d1da13adeabcf9186d114e2dad9a4fdbe2e440f7afbccde0c13dfbaf464efcd850b69d3371c5bf8b179d7ceb9d81f4af3cc22960b90834e41eaaf6d52ef7d331

# Conflicts:
#	src/reverselock.h
#	src/rpc/misc.cpp
#	src/scheduler.cpp
#	src/scheduler.h
#	src/sync.cpp
#	src/sync.h
#	src/test/reverselock_tests.cpp
#	src/test/scheduler_tests.cpp
#	src/test/test_dash.cpp
#	test/lint/extended-lint-cppcheck.sh
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Sep 28, 2021
…_var,chrono with std equivalents in scheduler

This backport does not include changes that depend on bitcoin pr 18037

70a6b52 lint-cppcheck: Remove -DHAVE_WORKING_BOOST_SLEEP_FOR (Anthony Towns)
294937b scheduler_tests: re-enable mockforward test (Anthony Towns)
cea19f6 Drop unused reverselock.h (Anthony Towns)
d0ebd93 scheduler: switch from boost to std (Anthony Towns)
b9c4260 sync.h: add REVERSE_LOCK (Anthony Towns)
306f71b scheduler: don't rely on boost interrupt on shutdown (Anthony Towns)

Pull request description:

  Replacing boost functionality with C++11 stuff.

  Motivated by bitcoin#18227, but should stand alone. Changing from `boost::condition_var` to `std::condition_var` means `threadGroup.interrupt_all` isn't enough to interrupt `serviceQueue` anymore, so that means calling `stop()` before `join_all()` is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from `g_lockstack` which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour.

  Fixes bitcoin#16027, Fixes bitcoin#14200, Fixes bitcoin#18227

ACKs for top commit:
  laanwj:
    ACK 70a6b52

Tree-SHA512: d1da13adeabcf9186d114e2dad9a4fdbe2e440f7afbccde0c13dfbaf464efcd850b69d3371c5bf8b179d7ceb9d81f4af3cc22960b90834e41eaaf6d52ef7d331

# Conflicts:
#	src/reverselock.h
#	src/rpc/misc.cpp
#	src/scheduler.cpp
#	src/scheduler.h
#	src/sync.cpp
#	src/sync.h
#	src/test/reverselock_tests.cpp
#	src/test/scheduler_tests.cpp
#	src/test/test_dash.cpp
#	test/lint/extended-lint-cppcheck.sh
kwvg pushed a commit to kwvg/dash that referenced this pull request Oct 12, 2021
…_var,chrono with std equivalents in scheduler

This backport does not include changes that depend on bitcoin pr 18037

70a6b52 lint-cppcheck: Remove -DHAVE_WORKING_BOOST_SLEEP_FOR (Anthony Towns)
294937b scheduler_tests: re-enable mockforward test (Anthony Towns)
cea19f6 Drop unused reverselock.h (Anthony Towns)
d0ebd93 scheduler: switch from boost to std (Anthony Towns)
b9c4260 sync.h: add REVERSE_LOCK (Anthony Towns)
306f71b scheduler: don't rely on boost interrupt on shutdown (Anthony Towns)

Pull request description:

  Replacing boost functionality with C++11 stuff.

  Motivated by bitcoin#18227, but should stand alone. Changing from `boost::condition_var` to `std::condition_var` means `threadGroup.interrupt_all` isn't enough to interrupt `serviceQueue` anymore, so that means calling `stop()` before `join_all()` is needed. And the existing reverselock.h code doesn't work with sync.h's DebugLock code (because the reversed lock won't be removed from `g_lockstack` which then leads to incorrect potential deadlock warnings), so I've replaced that with a dedicated class and macro that's aware of our debug lock behaviour.

  Fixes bitcoin#16027, Fixes bitcoin#14200, Fixes bitcoin#18227

ACKs for top commit:
  laanwj:
    ACK 70a6b52

Tree-SHA512: d1da13adeabcf9186d114e2dad9a4fdbe2e440f7afbccde0c13dfbaf464efcd850b69d3371c5bf8b179d7ceb9d81f4af3cc22960b90834e41eaaf6d52ef7d331

# Conflicts:
#	src/reverselock.h
#	src/rpc/misc.cpp
#	src/scheduler.cpp
#	src/scheduler.h
#	src/sync.cpp
#	src/sync.h
#	src/test/reverselock_tests.cpp
#	src/test/scheduler_tests.cpp
#	src/test/test_dash.cpp
#	test/lint/extended-lint-cppcheck.sh
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
9 participants