-
Notifications
You must be signed in to change notification settings - Fork 37.8k
net: drop boost::thread_group #9289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Concept ACK |
Would it make sense to introduce a utility function like MilliSleep which takes a reference to a condition variable and an atomic flag? The pattern that replaces the MilliSleep calls is pretty complicated and repeats several times. |
Concept ACK |
@sipa Sure, will do. |
@sipa done. I'm not sure if it's generic enough to add to utiltime, happy to move it back to net if that makes more sense. Also fixed up the TODO that I was mistaken about. |
7700fd4
to
aae161e
Compare
Just noticed that this PR would've broken interruptible proxy recvs, so I pushed a commit to fix that. The proxy code will soon be dumped anyway, so I'm ok with the hackish global there for now. Also squashed down the header fix while I was at it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does each thread need its own atomic_flag? We currently dont use it and I'm not sure I envision a scenario where you want to interrupt only some of the threads. Would simplify the code a decent chunk, I think.
I super dont like using atomic_flags and resetting them to dont-wake every time you check them...I think its fine in this PR, but it just seems like a bad idea generally. atomic_bool should be just fine, too :).
@TheBlueMatt I think you might be confused as to how the atomic flags are used in this code. They're used in "clear triggered mode", so there is no need to reset them after every check. The other usage ("test_and_set triggered mode") would introduce race conditions when the checking thread clears the flag as a concurrent interrupt could be cleared. |
@JeremyRubin I see how they're used, am only commenting on the likelihood of this pattern introducing future bugs. Eg someone uses break instead of return and then a loop gets moved around, making the thread no longer exit.
…On December 20, 2016 10:51:14 PM PST, Jeremy Rubin ***@***.***> wrote:
@TheBlueMatt I think you might be confused as to how the atomic flags
are used in this code. They're used in "clear triggered mode", so there
is no need to reset them after every check. The other usage
("test_and_set triggered mode") would introduce race conditions when
the checking thread clears the flag as a concurrent interrupt could be
cleared.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#9289 (comment)
|
@TheBlueMatt ok I was responding to "atomic_flags and resetting them to dont-wake every time you check them.", not sure what you meant by that if not my earlier interpretation. I agree with you on the semantics being off if your worry is future code breaking it. We should either make it throw an Interruption, or call std::terminate. Changing it to an atomic_bool will not have any effect on reducing likelihood future code changes don't break intended behavior. Implemented here for comparison theuni/bitcoin@connman-threads...JeremyRubin:connman-threads The only negative of this approach is that local cleanup can't be handled as neatly. |
@JeremyRubin well my note was that if you use an atomic_bool instead of a flag (which means you can check without resetting the state to the dont-exit-now state) then even if there is a bug the worst that happens is you will exit the thread at the next check. |
@JeremyRubin sadly I'm not sure about the execution propagation from within condition_variable...cppreference seems to indicate it was cleaned up in C++14, but its unclear if its guaranteed to propagation from the predicate or if it just "may". |
Thanks for clarifying the kind of bug you were imagining, I had a different type of bug in mind (there being no code flow which can cause a termination, rather than being catchable on the next iteration). You're absolutely right on the "may" for exceptions thrown from wait_for. Semantics should be fixed now (moved InterruptibleSleep into interruption_point and made the throwing external to wait_for). |
@JeremyRubin ehh, I mean yes regaring wrappers, no regarding exceptions - I think its generally accepted that using exceptions in C++ is bad, and I'm not sure we want to add any more uses than we already have. |
@JeremyRubin and @TheBlueMatt Thanks for having a look. After a debate (probably more than necessary), and running it by a third party, I think I'll concede and just use an atomic bool instead. The convincing argument was that a bool allows us to deviate from the current "got an interrupt, bail from thread!", and lets us do quick on-thread teardown post-interruption. It will need to stay in check though, our usages of Interrupt() assume that it won't block. |
@TheBlueMatt I pushed up a new attempt to https://github.com/theuni/bitcoin/commits/connman-threads-redo. I think you'll like that much better. I need to split a few non-obvious things into different commits, but mind having a quick look at the concept? I borrowed @JeremyRubin's wrapper idea but implemented it a bit differently: theuni@7b4d8f6#diff-09404f8fdd1f69609d329cb0b1f1252e The usage is (imo) pretty straightforward: theuni@7b4d8f6#diff-9a82240fe7dfe86564178691cc57f2f1L1660 |
@theuni Yup, really like that version. |
288a779
to
e0bdb1d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did the socket handler startup need to move up? Did I miss a bug that woule be cause by it not or did you just prefer it that way? (dont care, just asking to make sure I'm aware)
There is still a MilliSleep call in ThreadSocketHandler. Is that intentional?
#include <condition_variable> | ||
#include <atomic> | ||
|
||
class CThreadInterrupt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I think it might make sense to just rename this "InterruptibleSleeper" and have it own the mutex and condition_variable. Then you dont have to have its owner own memory it points to and should still be able to keep +/- the same API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes you'll want a real interruptible condvar, though. See theuni@da68e3c#diff-9a82240fe7dfe86564178691cc57f2f1R1887.
The alternative would be to add wait_for/wait_until/notify_one/notify_all to CThreadInterrupt, then just forward them to the condvar. But for that, you'd still need to expose the mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this could be rigged up with a separate condvar/mutex and some additional manual isinterruped checks.
* @returns false if the sleep was interrupted, true otherwise | ||
*/ | ||
template <typename Duration> | ||
bool InterruptibleSleep(const Duration& rel_time, CThreadInterrupt& threadInterrupt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just move this into a member function of the class?
Also, does it need to be a template? I think if you set Duration to something that the various durations can be converted into it should do so intelligently. Would mean less stuff to compile in headers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This started out as a member function, but interrupt.sleep_for() felt awkward conceptually.
We can probably just force this to milliseconds. If we need microsecond precision later, we can just add another function.
{ | ||
public: | ||
CThreadInterrupt(std::condition_variable& condIn, std::mutex& mutIn) : cond(condIn), mut(mutIn), val(false) {} | ||
void reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please no new implementations-in-headers. I know its mostly single-line stuff but we're fighting to reduce compile-time memory usage, so every little bit helps. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Though some of these need to be inlined. I figured someone would yell if I added a .cpp with 2 functions :p
Socket handler moved up just so that there's a consistent start/stop order. Thanks for pointing out the MilliSleep, definitely not intentional! |
Yes, I guess my conception is different - if we make it a generic wrapper for condition variables then moving sleep into it would make sense. I thought some about what that would mean for net_processing, but missed that you'd need to expose the lock (exposing interrupt seems otherwise reasonable to me).
No longer convinced which one is nicer, up to you.
…On December 22, 2016 3:48:21 PM PST, Cory Fields ***@***.***> wrote:
theuni commented on this pull request.
> + std::condition_variable& cond;
+ std::mutex& mut;
+ std::atomic<bool> val;
+
+ template <typename Duration>
+ friend bool InterruptibleSleep(const Duration&,
CThreadInterrupt&);
+};
+
+/*
+ * Sleep for the stated period of time, interruptible by clearing the
flag and notifying the condvar.
+ * @param rel_time maximum time to wait. Should be a
std::chrono::duration.
+ * @param threadInterrupt The interrupt that may wake the sleep
+ * @returns false if the sleep was interrupted, true otherwise
+ */
+template <typename Duration>
+bool InterruptibleSleep(const Duration& rel_time, CThreadInterrupt&
threadInterrupt)
This started out as a member function, but interrupt.sleep_for() felt
awkward conceptually.
We can probably just force this to milliseconds. If we need microsecond
precision later, we can just add another function.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#9289
|
Actually, I'm not sure you need to expose the lock, you just need to take the lock on the notifying thread prior to the notify but after setting the wakeup condition.
If you want to try it this way, I'd say you'd want to have two wakeup conditions, a wake-up and an interrupt condition, and just expose the various operations that you need. Not sure it's worth it, but I think that would be the cleanest API (modulo the too-much-wrapper potential issue).
…On December 22, 2016 3:53:28 PM PST, Cory Fields ***@***.***> wrote:
theuni commented on this pull request.
> @@ -0,0 +1,51 @@
+// Copyright (c) 2009-2010 Satoshi Nakamoto
+// Copyright (c) 2009-2016 The Bitcoin Core developers
+// Distributed under the MIT software license, see the accompanying
+// file COPYING or http://www.opensource.org/licenses/mit-license.php.
+
+#include <mutex>
+#include <condition_variable>
+#include <atomic>
+
+class CThreadInterrupt
I suppose this could be rigged up with a separate condvar/mutex and
some additional manual isinterruped checks.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#9289
|
That would require the wakeup condition to be atomic, no? |
Yes, I believe that implies the wakeup condition must be atomic, but it can be release/acquire since a mutex unlock implies a full memory barrier.
In the other case that I proposed (doing it with two atomics - one for wake-up and one for interruption all hidden inside the object they would both be atomic and you'd want ClearWakeupState() and Wake() functions).
…On December 22, 2016 5:02:46 PM PST, Cory Fields ***@***.***> wrote:
That would require the wakeup condition to be atomic, no?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#9289 (comment)
|
@TheBlueMatt Pushed a round of changes after last night's IRC discussion. I hope we can compromise on this approach. |
Pushed a commit for @sipa's nits. I've been running my dev branches on top of this for several days, killing bitcoind in all kinds of rude ways while debugging, and it has continued to shutdown reliably. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
messageHandlerCondition.timed_wait(lock, boost::posix_time::microsec_clock::universal_time() + boost::posix_time::milliseconds(100)); | ||
if (fSleep) { | ||
std::unique_lock<std::mutex> lock(mutexMsgProc); | ||
condMsgProc.wait_until(lock, std::chrono::steady_clock::now() + std::chrono::milliseconds(100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. Let's think about generalizing this later. No need to hold this PR up.
eab5b69
to
29aa22a
Compare
@sipa I figured you'd have more nits :) Went ahead and squashed. |
Needs rebase. |
- Drop the interruption point directly after the pnode allocation. This would be leaky if hit. - Rearrange thread creation so that the socket handler comes first
Also now that net threads are interruptible, switch them to use std threads/binds/mutexes/condvars.
This is now a std::thread, so there's no hope of catching a boost interruption point.
29aa22a
to
67ee4ec
Compare
Done. |
lightly-tested ACK 67ee4ec |
67ee4ec net: misc header cleanups (Cory Fields) 8b3159e net: make proxy receives interruptible (Cory Fields) 5cb0fce net: remove thread_interrupted catch (Cory Fields) d3d7056 net: make net processing interruptible (Cory Fields) 0985052 net: make net interruptible (Cory Fields) 799df91 net: add CThreadInterrupt and InterruptibleSleep (Cory Fields) 7325b15 net: a few small cleanups before replacing boost threads (Cory Fields)
e733939 [Cleanup] Remove CConnman::Copy(Release)NodeVector, now unused (random-zebra) b09da57 [Refactor] Proper CConnman encapsulation of mnsync (random-zebra) 219061e [Refactor] Decouple SyncWithNode from CMasternodeSync::Process() (random-zebra) e1f3620 [Refactor] Proper CConnman encapsulation of proposals / votes sync (random-zebra) f38c9f9 miner: don't try to access connman if it doesn't exist (Fuzzbawls) 92ab619 Address feedback (Fuzzbawls) 9d5346a Do not set an addr time penalty when a peer advertises itself. (Gregory Maxwell) fe12ff9 net: No longer send local address in addrMe (Wladimir J. van der Laan) 14d6917 net: misc header cleanups (Cory Fields) 34ba0de net: make proxy receives interruptible (Cory Fields) 1bd97ef net: make net processing interruptable (Fuzzbawls) e24b4cc net: make net interruptible (Cory Fields) 814d0de net: add CThreadInterrupt and InterruptibleSleep (Cory Fields) 1443f2a net: a few small cleanups before replacing boost threads (Cory Fields) 58eabe4 net: move MAX_FEELER_CONNECTIONS into connman (Cory Fields) 874baba Convert ForEachNode* functions to take a templated function argument rather than a std::function to eliminate std::function overhead (Jeremy Rubin) 9485ab0 Made the ForEachNode* functions in src/net.cpp more pragmatic and self documenting (Jeremy Rubin) c1e59ad minor net cleanups (Fuzzbawls) 07ae004 net: move vNodesDisconnected into CConnman (Cory Fields) 276c946 net: add nSendBufferMaxSize/nReceiveFloodSize to CConnection::Options (Cory Fields) 22a9aff net: Introduce CConnection::Options to avoid passing so many params (Cory Fields) e4891bf net: Drop StartNode/StopNode and use CConnman directly (Cory Fields) 431575c net: pass CClientUIInterface into CConnman (Cory Fields) 48de47e net: Pass best block known height into CConnman (Cory Fields) 15eed91 net: move max/max-outbound to CConnman (Cory Fields) 2bf0921 net: move semOutbound to CConnman (Cory Fields) 481929f net: move nLocalServices/nRelevantServices to CConnman (Cory Fields) bcee6ae net: move SendBufferSize/ReceiveFloodSize to CConnman (Cory Fields) 6865469 net: move send/recv statistics to CConnman (Cory Fields) 1cec418 net: SocketSendData returns written size (Cory Fields) 2bb9dfa net: move messageHandlerCondition to CConnman (Cory Fields) 9c5a0df net: move nLocalHostNonce to CConnman (Cory Fields) a1394ef net: move nLastNodeId to CConnman (Cory Fields) 3804c29 net: move whitelist functions into CConnman (Cory Fields) dbde9be net: create generic functor accessors and move vNodes to CConnman (Cory Fields) 2e02467 net: Add most functions needed for vNodes to CConnman (Cory Fields) 5667e61 net: move added node functions to CConnman (Cory Fields) 37487ed net: Add oneshot functions to CConnman (Cory Fields) facf878 net: move ban and addrman functions into CConnman (Cory Fields) 091aaf2 net: handle nodesignals in CConnman (Cory Fields) 1e9fa0f net: move OpenNetworkConnection into CConnman (Cory Fields) 573200f net: Move socket binding into CConnman (Cory Fields) 7762b97 net: Pass CConnection to wallet rather than using the global (Fuzzbawls) 00591b8 net: Pass CConnman around as needed (Cory Fields) 2cd3d39 net: Add rpc error for missing/disabled p2p functionality (Cory Fields) f08e316 net: Create CConnman to encapsulate p2p connections (Cory Fields) 66337dc net: move CBanDB and CAddrDB out of net.h/cpp (Fuzzbawls) 10969f6 gui: add NodeID to the peer table (Cory Fields) 58044ac Fix some locks (Pieter Wuille) f9f8926 Do not shadow variables in networking code (Pavel Janík) Pull request description: This is the finalization of the upstream PRs being tracked in #1374 to update much of our networking code to more closely follow upstream improvements. The following PRs are included here: - bitcoin#8466 Do not shadow variables in networking code - bitcoin#8606 Fix some locks - bitcoin#8085 Begin encapsulation - bitcoin#9289 drop boost::thread_group - bitcoin#8740 No longer send local address in addrMe - bitcoin#8661 Do not set an addr time penalty when a peer advertises itself Additionally, during conflict resolution and backporting of 8085, the following additional upstream PR was included: - bitcoin#8715 only delete CConnman if it's been created Still TODO in future PRs: - bitcoin#9037 - bitcoin#9441 - bitcoin#9609 - bitcoin#9626 - bitcoin#9708 - bitcoin#12381 - bitcoin#18584 ACKs for top commit: furszy: same here, code ACK e733939. Nice work ☕ . furszy: ACK e733939 . random-zebra: ACK e733939 and merging... Tree-SHA512: 0fc3cca76d9ddc13f75fc9d48c48d215e6c0e0381377c0318436176a0e0ead73b511e0061a4b63f7052874730b6da8b0ffc2c94cba034bcc39aad4212b69ee22
(Split out and rewritten chunk of #8631. I'll rebase that on top of this post-merge and reopen.)
This is a prerequisite for async network handling. As-is, we don't have enough control over the shutdown process to be able to deal with async connecting and send/recv.
In particular, we need to be sure that message processing has shut down before forcing all networking down, otherwise we run the risk of trying to process a node's messages during its destruction. This is not a problem now because the current sync model works fine with all threads being interrupted at the same time.
And if that's not a satisfactory reason for the change, it also gets rid of a nice chunk of boost threading (and simplifies the rebase of #8631 :)
To accomplish this, we need to:
The MilliSleeps are replaced with condvars that check for the thread's interrupt flag. Since the flags are atomic, there's no need for real locking, so these condvars use a dummy CNullLock.
With all of that done, we may as well switch away from boost threads, since we're no longer dependent on interruption.
@TheBlueMatt 3f3f0b4 is the awkward change I mentioned on IRC. The global will be cleaned up when we move processing into a class as a next step (something like theuni@3f598db)