net: drop boost::thread_group #9289

theuni · 2016-12-06T03:02:13Z

(Split out and rewritten chunk of #8631. I'll rebase that on top of this post-merge and reopen.)

This is a prerequisite for async network handling. As-is, we don't have enough control over the shutdown process to be able to deal with async connecting and send/recv.

In particular, we need to be sure that message processing has shut down before forcing all networking down, otherwise we run the risk of trying to process a node's messages during its destruction. This is not a problem now because the current sync model works fine with all threads being interrupted at the same time.

And if that's not a satisfactory reason for the change, it also gets rid of a nice chunk of boost threading (and simplifies the rebase of #8631 :)

To accomplish this, we need to:

Set a flag to interrupt all net threads
notify all blocked condvars
replace MilliSleep (since they're boost interruption points)
teach init to Interrupt/Stop, similar to some of the other subsystems.

The MilliSleeps are replaced with condvars that check for the thread's interrupt flag. Since the flags are atomic, there's no need for real locking, so these condvars use a dummy CNullLock.

With all of that done, we may as well switch away from boost threads, since we're no longer dependent on interruption.

@TheBlueMatt 3f3f0b4 is the awkward change I mentioned on IRC. The global will be cleaned up when we move processing into a class as a next step (something like theuni@3f598db)

laanwj · 2016-12-06T12:24:00Z

Concept ACK

sipa · 2016-12-06T19:35:25Z

Would it make sense to introduce a utility function like MilliSleep which takes a reference to a condition variable and an atomic flag? The pattern that replaces the MilliSleep calls is pretty complicated and repeats several times.

fanquake · 2016-12-07T10:57:56Z

Concept ACK

theuni · 2016-12-08T14:05:22Z

@sipa Sure, will do.

theuni · 2016-12-09T01:07:07Z

@sipa done. I'm not sure if it's generic enough to add to utiltime, happy to move it back to net if that makes more sense.

Also fixed up the TODO that I was mistaken about.

theuni · 2016-12-16T00:51:44Z

Just noticed that this PR would've broken interruptible proxy recvs, so I pushed a commit to fix that. The proxy code will soon be dumped anyway, so I'm ok with the hackish global there for now.

Also squashed down the header fix while I was at it.

TheBlueMatt

Does each thread need its own atomic_flag? We currently dont use it and I'm not sure I envision a scenario where you want to interrupt only some of the threads. Would simplify the code a decent chunk, I think.

I super dont like using atomic_flags and resetting them to dont-wake every time you check them...I think its fine in this PR, but it just seems like a bad idea generally. atomic_bool should be just fine, too :).

JeremyRubin · 2016-12-21T06:51:09Z

@TheBlueMatt I think you might be confused as to how the atomic flags are used in this code. They're used in "clear triggered mode", so there is no need to reset them after every check. The other usage ("test_and_set triggered mode") would introduce race conditions when the checking thread clears the flag as a concurrent interrupt could be cleared.

TheBlueMatt · 2016-12-21T08:50:26Z

@JeremyRubin I see how they're used, am only commenting on the likelihood of this pattern introducing future bugs. Eg someone uses break instead of return and then a loop gets moved around, making the thread no longer exit.

…

On December 20, 2016 10:51:14 PM PST, Jeremy Rubin ***@***.***> wrote: @TheBlueMatt I think you might be confused as to how the atomic flags are used in this code. They're used in "clear triggered mode", so there is no need to reset them after every check. The other usage ("test_and_set triggered mode") would introduce race conditions when the checking thread clears the flag as a concurrent interrupt could be cleared. -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #9289 (comment)

JeremyRubin · 2016-12-21T19:19:24Z

@TheBlueMatt ok I was responding to "atomic_flags and resetting them to dont-wake every time you check them.", not sure what you meant by that if not my earlier interpretation.

I agree with you on the semantics being off if your worry is future code breaking it. We should either make it throw an Interruption, or call std::terminate. Changing it to an atomic_bool will not have any effect on reducing likelihood future code changes don't break intended behavior. Implemented here for comparison theuni/bitcoin@connman-threads...JeremyRubin:connman-threads The only negative of this approach is that local cleanup can't be handled as neatly.

TheBlueMatt · 2016-12-21T19:41:19Z

@JeremyRubin well my note was that if you use an atomic_bool instead of a flag (which means you can check without resetting the state to the dont-exit-now state) then even if there is a bug the worst that happens is you will exit the thread at the next check.

TheBlueMatt · 2016-12-21T20:06:50Z

@JeremyRubin sadly I'm not sure about the execution propagation from within condition_variable...cppreference seems to indicate it was cleaned up in C++14, but its unclear if its guaranteed to propagation from the predicate or if it just "may".

JeremyRubin · 2016-12-21T20:59:59Z

Thanks for clarifying the kind of bug you were imagining, I had a different type of bug in mind (there being no code flow which can cause a termination, rather than being catchable on the next iteration).

You're absolutely right on the "may" for exceptions thrown from wait_for. Semantics should be fixed now (moved InterruptibleSleep into interruption_point and made the throwing external to wait_for).

TheBlueMatt · 2016-12-21T21:28:06Z

@JeremyRubin ehh, I mean yes regaring wrappers, no regarding exceptions - I think its generally accepted that using exceptions in C++ is bad, and I'm not sure we want to add any more uses than we already have.

theuni · 2016-12-22T02:43:56Z

@JeremyRubin and @TheBlueMatt Thanks for having a look.

After a debate (probably more than necessary), and running it by a third party, I think I'll concede and just use an atomic bool instead. The convincing argument was that a bool allows us to deviate from the current "got an interrupt, bail from thread!", and lets us do quick on-thread teardown post-interruption. It will need to stay in check though, our usages of Interrupt() assume that it won't block.

theuni · 2016-12-22T07:19:55Z

@TheBlueMatt I pushed up a new attempt to https://github.com/theuni/bitcoin/commits/connman-threads-redo. I think you'll like that much better.

I need to split a few non-obvious things into different commits, but mind having a quick look at the concept?

I borrowed @JeremyRubin's wrapper idea but implemented it a bit differently: theuni@7b4d8f6#diff-09404f8fdd1f69609d329cb0b1f1252e

The usage is (imo) pretty straightforward: theuni@7b4d8f6#diff-9a82240fe7dfe86564178691cc57f2f1L1660

TheBlueMatt · 2016-12-22T18:17:01Z

@theuni Yup, really like that version.

TheBlueMatt

Why did the socket handler startup need to move up? Did I miss a bug that woule be cause by it not or did you just prefer it that way? (dont care, just asking to make sure I'm aware)

There is still a MilliSleep call in ThreadSocketHandler. Is that intentional?

TheBlueMatt · 2016-12-22T23:34:10Z

src/threadinterrupt.h

+#include <condition_variable>
+#include <atomic>
+
+class CThreadInterrupt


Generally, I think it might make sense to just rename this "InterruptibleSleeper" and have it own the mutex and condition_variable. Then you dont have to have its owner own memory it points to and should still be able to keep +/- the same API.

Sometimes you'll want a real interruptible condvar, though. See theuni@da68e3c#diff-9a82240fe7dfe86564178691cc57f2f1R1887.

The alternative would be to add wait_for/wait_until/notify_one/notify_all to CThreadInterrupt, then just forward them to the condvar. But for that, you'd still need to expose the mutex.

I suppose this could be rigged up with a separate condvar/mutex and some additional manual isinterruped checks.

TheBlueMatt · 2016-12-22T23:35:30Z

src/threadinterrupt.h

+ * @returns false if the sleep was interrupted, true otherwise
+ */
+template <typename Duration>
+bool InterruptibleSleep(const Duration& rel_time, CThreadInterrupt& threadInterrupt)


Can we just move this into a member function of the class?

Also, does it need to be a template? I think if you set Duration to something that the various durations can be converted into it should do so intelligently. Would mean less stuff to compile in headers.

This started out as a member function, but interrupt.sleep_for() felt awkward conceptually.

We can probably just force this to milliseconds. If we need microsecond precision later, we can just add another function.

TheBlueMatt · 2016-12-22T23:36:08Z

src/threadinterrupt.h

+{
+public:
+    CThreadInterrupt(std::condition_variable& condIn, std::mutex& mutIn) : cond(condIn), mut(mutIn), val(false) {}
+    void reset()


Please no new implementations-in-headers. I know its mostly single-line stuff but we're fighting to reduce compile-time memory usage, so every little bit helps. :)

Sure. Though some of these need to be inlined. I figured someone would yell if I added a .cpp with 2 functions :p

theuni · 2016-12-22T23:51:08Z

Socket handler moved up just so that there's a consistent start/stop order.

Thanks for pointing out the MilliSleep, definitely not intentional!

TheBlueMatt · 2016-12-23T00:04:55Z

Yes, I guess my conception is different - if we make it a generic wrapper for condition variables then moving sleep into it would make sense. I thought some about what that would mean for net_processing, but missed that you'd need to expose the lock (exposing interrupt seems otherwise reasonable to me). No longer convinced which one is nicer, up to you.

…

On December 22, 2016 3:48:21 PM PST, Cory Fields ***@***.***> wrote: theuni commented on this pull request. > + std::condition_variable& cond; + std::mutex& mut; + std::atomic<bool> val; + + template <typename Duration> + friend bool InterruptibleSleep(const Duration&, CThreadInterrupt&); +}; + +/* + * Sleep for the stated period of time, interruptible by clearing the flag and notifying the condvar. + * @param rel_time maximum time to wait. Should be a std::chrono::duration. + * @param threadInterrupt The interrupt that may wake the sleep + * @returns false if the sleep was interrupted, true otherwise + */ +template <typename Duration> +bool InterruptibleSleep(const Duration& rel_time, CThreadInterrupt& threadInterrupt) This started out as a member function, but interrupt.sleep_for() felt awkward conceptually. We can probably just force this to milliseconds. If we need microsecond precision later, we can just add another function. -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #9289

TheBlueMatt · 2016-12-23T00:19:16Z

Actually, I'm not sure you need to expose the lock, you just need to take the lock on the notifying thread prior to the notify but after setting the wakeup condition. If you want to try it this way, I'd say you'd want to have two wakeup conditions, a wake-up and an interrupt condition, and just expose the various operations that you need. Not sure it's worth it, but I think that would be the cleanest API (modulo the too-much-wrapper potential issue).

…

On December 22, 2016 3:53:28 PM PST, Cory Fields ***@***.***> wrote: theuni commented on this pull request. > @@ -0,0 +1,51 @@ +// Copyright (c) 2009-2010 Satoshi Nakamoto +// Copyright (c) 2009-2016 The Bitcoin Core developers +// Distributed under the MIT software license, see the accompanying +// file COPYING or http://www.opensource.org/licenses/mit-license.php. + +#include <mutex> +#include <condition_variable> +#include <atomic> + +class CThreadInterrupt I suppose this could be rigged up with a separate condvar/mutex and some additional manual isinterruped checks. -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #9289

theuni · 2016-12-23T01:02:42Z

That would require the wakeup condition to be atomic, no?

TheBlueMatt · 2016-12-23T02:16:38Z

Yes, I believe that implies the wakeup condition must be atomic, but it can be release/acquire since a mutex unlock implies a full memory barrier. In the other case that I proposed (doing it with two atomics - one for wake-up and one for interruption all hidden inside the object they would both be atomic and you'd want ClearWakeupState() and Wake() functions).

…

On December 22, 2016 5:02:46 PM PST, Cory Fields ***@***.***> wrote: That would require the wakeup condition to be atomic, no? -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #9289 (comment)

theuni · 2016-12-24T01:49:10Z

@TheBlueMatt Pushed a round of changes after last night's IRC discussion. I hope we can compromise on this approach.

theuni · 2016-12-30T19:32:55Z

Pushed a commit for @sipa's nits. I've been running my dev branches on top of this for several days, killing bitcoind in all kinds of rude ways while debugging, and it has continued to shutdown reliably.

sipa

utACK

sipa · 2016-12-30T19:24:27Z

src/net.cpp

-            messageHandlerCondition.timed_wait(lock, boost::posix_time::microsec_clock::universal_time() + boost::posix_time::milliseconds(100));
+        if (fSleep) {
+            std::unique_lock<std::mutex> lock(mutexMsgProc);
+            condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100));


Ok, I see. Let's think about generalizing this later. No need to hold this PR up.

theuni · 2016-12-30T20:20:11Z

@sipa I figured you'd have more nits :)

Went ahead and squashed.

sipa · 2017-01-03T21:55:00Z

Needs rebase.

- Drop the interruption point directly after the pnode allocation. This would be leaky if hit. - Rearrange thread creation so that the socket handler comes first

Also now that net threads are interruptible, switch them to use std threads/binds/mutexes/condvars.

This is now a std::thread, so there's no hope of catching a boost interruption point.

theuni · 2017-01-04T00:06:50Z

Done.

laanwj · 2017-01-04T11:20:59Z

lightly-tested ACK 67ee4ec

67ee4ec net: misc header cleanups (Cory Fields) 8b3159e net: make proxy receives interruptible (Cory Fields) 5cb0fce net: remove thread_interrupted catch (Cory Fields) d3d7056 net: make net processing interruptible (Cory Fields) 0985052 net: make net interruptible (Cory Fields) 799df91 net: add CThreadInterrupt and InterruptibleSleep (Cory Fields) 7325b15 net: a few small cleanups before replacing boost threads (Cory Fields)

e733939 [Cleanup] Remove CConnman::Copy(Release)NodeVector, now unused (random-zebra) b09da57 [Refactor] Proper CConnman encapsulation of mnsync (random-zebra) 219061e [Refactor] Decouple SyncWithNode from CMasternodeSync::Process() (random-zebra) e1f3620 [Refactor] Proper CConnman encapsulation of proposals / votes sync (random-zebra) f38c9f9 miner: don't try to access connman if it doesn't exist (Fuzzbawls) 92ab619 Address feedback (Fuzzbawls) 9d5346a Do not set an addr time penalty when a peer advertises itself. (Gregory Maxwell) fe12ff9 net: No longer send local address in addrMe (Wladimir J. van der Laan) 14d6917 net: misc header cleanups (Cory Fields) 34ba0de net: make proxy receives interruptible (Cory Fields) 1bd97ef net: make net processing interruptable (Fuzzbawls) e24b4cc net: make net interruptible (Cory Fields) 814d0de net: add CThreadInterrupt and InterruptibleSleep (Cory Fields) 1443f2a net: a few small cleanups before replacing boost threads (Cory Fields) 58eabe4 net: move MAX_FEELER_CONNECTIONS into connman (Cory Fields) 874baba Convert ForEachNode* functions to take a templated function argument rather than a std::function to eliminate std::function overhead (Jeremy Rubin) 9485ab0 Made the ForEachNode* functions in src/net.cpp more pragmatic and self documenting (Jeremy Rubin) c1e59ad minor net cleanups (Fuzzbawls) 07ae004 net: move vNodesDisconnected into CConnman (Cory Fields) 276c946 net: add nSendBufferMaxSize/nReceiveFloodSize to CConnection::Options (Cory Fields) 22a9aff net: Introduce CConnection::Options to avoid passing so many params (Cory Fields) e4891bf net: Drop StartNode/StopNode and use CConnman directly (Cory Fields) 431575c net: pass CClientUIInterface into CConnman (Cory Fields) 48de47e net: Pass best block known height into CConnman (Cory Fields) 15eed91 net: move max/max-outbound to CConnman (Cory Fields) 2bf0921 net: move semOutbound to CConnman (Cory Fields) 481929f net: move nLocalServices/nRelevantServices to CConnman (Cory Fields) bcee6ae net: move SendBufferSize/ReceiveFloodSize to CConnman (Cory Fields) 6865469 net: move send/recv statistics to CConnman (Cory Fields) 1cec418 net: SocketSendData returns written size (Cory Fields) 2bb9dfa net: move messageHandlerCondition to CConnman (Cory Fields) 9c5a0df net: move nLocalHostNonce to CConnman (Cory Fields) a1394ef net: move nLastNodeId to CConnman (Cory Fields) 3804c29 net: move whitelist functions into CConnman (Cory Fields) dbde9be net: create generic functor accessors and move vNodes to CConnman (Cory Fields) 2e02467 net: Add most functions needed for vNodes to CConnman (Cory Fields) 5667e61 net: move added node functions to CConnman (Cory Fields) 37487ed net: Add oneshot functions to CConnman (Cory Fields) facf878 net: move ban and addrman functions into CConnman (Cory Fields) 091aaf2 net: handle nodesignals in CConnman (Cory Fields) 1e9fa0f net: move OpenNetworkConnection into CConnman (Cory Fields) 573200f net: Move socket binding into CConnman (Cory Fields) 7762b97 net: Pass CConnection to wallet rather than using the global (Fuzzbawls) 00591b8 net: Pass CConnman around as needed (Cory Fields) 2cd3d39 net: Add rpc error for missing/disabled p2p functionality (Cory Fields) f08e316 net: Create CConnman to encapsulate p2p connections (Cory Fields) 66337dc net: move CBanDB and CAddrDB out of net.h/cpp (Fuzzbawls) 10969f6 gui: add NodeID to the peer table (Cory Fields) 58044ac Fix some locks (Pieter Wuille) f9f8926 Do not shadow variables in networking code (Pavel Janík) Pull request description: This is the finalization of the upstream PRs being tracked in #1374 to update much of our networking code to more closely follow upstream improvements. The following PRs are included here: - bitcoin#8466 Do not shadow variables in networking code - bitcoin#8606 Fix some locks - bitcoin#8085 Begin encapsulation - bitcoin#9289 drop boost::thread_group - bitcoin#8740 No longer send local address in addrMe - bitcoin#8661 Do not set an addr time penalty when a peer advertises itself Additionally, during conflict resolution and backporting of 8085, the following additional upstream PR was included: - bitcoin#8715 only delete CConnman if it's been created Still TODO in future PRs: - bitcoin#9037 - bitcoin#9441 - bitcoin#9609 - bitcoin#9626 - bitcoin#9708 - bitcoin#12381 - bitcoin#18584 ACKs for top commit: furszy: same here, code ACK e733939. Nice work ☕ . furszy: ACK e733939 . random-zebra: ACK e733939 and merging... Tree-SHA512: 0fc3cca76d9ddc13f75fc9d48c48d215e6c0e0381377c0318436176a0e0ead73b511e0061a4b63f7052874730b6da8b0ffc2c94cba034bcc39aad4212b69ee22

maflcko added the Refactoring label Dec 6, 2016

laanwj mentioned this pull request Dec 14, 2016

Node crashed with assertion failure in network code #9348

Closed

theuni force-pushed the connman-threads branch from 7700fd4 to aae161e Compare December 16, 2016 00:49

TheBlueMatt reviewed Dec 21, 2016

View reviewed changes

laanwj mentioned this pull request Dec 21, 2016

Inline CTxInWitness inside CTxIn #8589

Merged

theuni force-pushed the connman-threads branch 2 times, most recently from 288a779 to e0bdb1d Compare December 22, 2016 20:37

TheBlueMatt reviewed Dec 22, 2016

View reviewed changes

sipa reviewed Dec 30, 2016

View reviewed changes

theuni force-pushed the connman-threads branch from eab5b69 to 29aa22a Compare December 30, 2016 20:19

theuni added 7 commits January 3, 2017 17:53

net: a few small cleanups before replacing boost threads

7325b15

- Drop the interruption point directly after the pnode allocation. This would be leaky if hit. - Rearrange thread creation so that the socket handler comes first

net: add CThreadInterrupt and InterruptibleSleep

799df91

net: make net interruptible

0985052

Also now that net threads are interruptible, switch them to use std threads/binds/mutexes/condvars.

net: make net processing interruptible

d3d7056

net: remove thread_interrupted catch

5cb0fce

This is now a std::thread, so there's no hope of catching a boost interruption point.

net: make proxy receives interruptible

8b3159e

net: misc header cleanups

67ee4ec

theuni force-pushed the connman-threads branch from 29aa22a to 67ee4ec Compare January 3, 2017 22:59

laanwj merged commit 67ee4ec into bitcoin:master Jan 4, 2017

ghost mentioned this pull request Mar 12, 2017

[WIP] Transitory Changes as adapted from BitCoin Core duality-solutions/Sequence#54

Merged

ghost mentioned this pull request Apr 4, 2017

[WIP] Transitory Changes as adapted from BitCoin Core duality-solutions/Sequence#61

Merged

OlegGirko mentioned this pull request Aug 9, 2017

Backport Bitcoin PR#9289: net: drop boost::thread_group dashpay/dash#1568

Merged

OlegGirko mentioned this pull request Aug 22, 2017

Backport Bitcoin PR#9441: Net: Massive speedup. Net locks overhaul dashpay/dash#1586

Merged

This was referenced May 26, 2018

Net cherries bitcoinxt/bitcoinxt#436

Merged

At shutdown, join threads before destroying Connman bitcoinxt/bitcoinxt#435

Closed

This was referenced Jul 16, 2019

Daemon wont completely shutdown navcoin/navcoin-core#569

Closed

Backport changes from upstream net code (p2p) navcoin/navcoin-core#573

Open

This was referenced Aug 19, 2020

[P2P] Begin Network Encapsulation PIVX-Project/PIVX#1812

Merged

Tracking: Backport upstream network encapsulation PIVX-Project/PIVX#1374

Closed

PRCYDev mentioned this pull request Jan 25, 2021

Tracking: Backport upstream network encapsulation PRCYCoin/PRCYCoin#4

Open

21 tasks

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

net: drop boost::thread_group #9289

net: drop boost::thread_group #9289

Uh oh!

Conversation

theuni commented Dec 6, 2016

Uh oh!

laanwj commented Dec 6, 2016

Uh oh!

sipa commented Dec 6, 2016

Uh oh!

fanquake commented Dec 7, 2016

Uh oh!

theuni commented Dec 8, 2016

Uh oh!

theuni commented Dec 9, 2016

Uh oh!

theuni commented Dec 16, 2016

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

JeremyRubin commented Dec 21, 2016

Uh oh!

TheBlueMatt commented Dec 21, 2016 via email

Uh oh!

JeremyRubin commented Dec 21, 2016

Uh oh!

TheBlueMatt commented Dec 21, 2016

Uh oh!

TheBlueMatt commented Dec 21, 2016

Uh oh!

JeremyRubin commented Dec 21, 2016

Uh oh!

TheBlueMatt commented Dec 21, 2016

Uh oh!

theuni commented Dec 22, 2016

Uh oh!

theuni commented Dec 22, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheBlueMatt commented Dec 22, 2016

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theuni commented Dec 22, 2016

Uh oh!

TheBlueMatt commented Dec 23, 2016 via email

Uh oh!

TheBlueMatt commented Dec 23, 2016 via email

Uh oh!

theuni commented Dec 23, 2016

Uh oh!

TheBlueMatt commented Dec 23, 2016 via email

Uh oh!

theuni commented Dec 24, 2016

Uh oh!

theuni commented Dec 30, 2016

Uh oh!

sipa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

theuni commented Dec 30, 2016

Uh oh!

theuni commented Dec 22, 2016 •

edited

Loading