Skip to content

Conversation

theuni
Copy link
Member

@theuni theuni commented Aug 31, 2016

This replaces #8023 with a solution that moves us directly to std::thread, rather than using an intermediate wrapper.

Background

std::thread is pretty much a drop-in replacement for boost::thread, except that boost::thread is interruptible. We've used those interruptions, but we'll have to drop them eventually as they didn't become part of the spec.

When an interruption point is hit, it throws an instance of boost::thread_interrupted. boost::thread catches these automatically by default.

The interruption points are defined as:


    boost::thread::join()
    boost::thread::timed_join()
    boost::thread::try_join_for(),
    boost::thread::try_join_until(),
    boost::condition_variable::wait()
    boost::condition_variable::timed_wait()
    boost::condition_variable::wait_for()
    boost::condition_variable::wait_until()
    boost::condition_variable_any::wait()
    boost::condition_variable_any::timed_wait()
    boost::condition_variable_any::wait_for()
    boost::condition_variable_any::wait_until()
    boost::thread::sleep()
    boost::this_thread::sleep_for()
    boost::this_thread::sleep_until()
    boost::this_thread::interruption_point()

The ones relevant to us are primarily boost::condition_variable_any::wait(), boost::this_thread::sleep_for(), and of course boost::this_thread::interruption_point()

Any boost::thread will immediately throw boost::thread_interrupted if those are hit at any point. So we can't simply s/boost::thread/std::thread/g, as we would never be able to exit.

Changes

  • All threads are now externally interruptible. Rather than having a single bool for shutting down, I've attempted to notify each relevant subsystem. But because they're not well-defined, the notifications are somewhat arbitrary as well.
  • For long-running threads, our own thread_interrupted and interruption_point have been added.
  • Threads should call TraceThread if they may hit an interruption_point.
  • MilliSleep is no longer interruptible.
  • boost::thread_group is gone. All threads are now managed individually

Future changes

There is lots of cleanup to be done post-merge. I was very tempted to include cleanups here, but I've elected to keep the changes minimal. Todo:

  • s/boost::mutex/std::mutex/
  • s/boost::condition_variable/std::condition_variable/
  • s/boost::chrono/std::chrono/
  • Drop a ton of boost includes
  • Drop other boost cruft (like catching boost::thread_interrupted)

Also, I started on this by creating a base class for launching threads. Each instance would be responsible for implementing Interrupt() and Stop(). But because this is already a complex set of changes, I think it's best to do that as a separate step.

@JeremyRubin
Copy link
Contributor

Concept ACK.

Will review & test soon.

The boost interruption_point paradigm is messy and can make code hard to reason about. In my code for #8464 I have also removed boost::thread and interruption points, preferring more explicit quitting. Dealing with the interruption code made working on #8464 more difficult than it had to be.

One Nit, which I think is worth addressing. I would refactor the names from Interrupt to Quit, Shutdown, or even ControlledExit. Connotation of interrupt is a non-consensual break, whereas Quit or Shutdown implies being done as a normal part of control flow.

@theuni
Copy link
Member Author

theuni commented Aug 31, 2016

@JeremyRubin Nearly every thread uses separate interrupt/shutdown calls for exactly that reason.

Interrupt is non-consensual, and means "stop what you're doing and don't start anything else". Shutdown means "please gracefully exit now and don't return until you do".

@JeremyRubin
Copy link
Contributor

@theuni Ah, gotcha. Was looking at small sample; my bad. I guess my comment should instead be, we should eventually migrate to shutdown only no interrupt threads; but this is a big step in that direction and precursor to that happening.

@sipa
Copy link
Member

sipa commented Aug 31, 2016

Concept ACK, testing.

@paveljanik
Copy link
Contributor

OS X here:

Making all in src
  CXX      test/test_test_bitcoin-scheduler_tests.o
In file included from test/scheduler_tests.cpp:8:
In file included from ./test/test_bitcoin.h:16:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/thread:337:5: error: attempt to use a deleted function
    __invoke(std::__1::move(std::__1::get<0>(__t)), std::__1::move(std::__1::get<_Indices>(__t))...);
    ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/thread:347:5: note: in instantiation of function template specialization 'std::__1::__thread_execute<void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> , 1>' requested here
    __thread_execute(*__p, _Index());
    ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/thread:359:42: note: in instantiation of function template specialization 'std::__1::__thread_proxy<std::__1::tuple<void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> > >' requested here
    int __ec = pthread_create(&__t_, 0, &__thread_proxy<_Gp>, __p.get());
                                         ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:1673:31: note: in instantiation of function template specialization 'std::__1::thread::thread<void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> , void>' requested here
            ::new((void*)__p) _Up(std::__1::forward<_Args>(__args)...);
                              ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:1600:18: note: in instantiation of function template specialization 'std::__1::allocator<std::__1::thread>::construct<std::__1::thread, void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> >' requested here
            {__a.construct(__p, std::__1::forward<_Args>(__args)...);}
                 ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/memory:1453:14: note: in instantiation of function template specialization 'std::__1::allocator_traits<std::__1::allocator<std::__1::thread> >::__construct<std::__1::thread, void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> >' requested here
            {__construct(__has_construct<allocator_type, _Tp*, _Args...>(),
             ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/vector:1643:25: note: in instantiation of function template specialization 'std::__1::allocator_traits<std::__1::allocator<std::__1::thread> >::construct<std::__1::thread, void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> >' requested here
        __alloc_traits::construct(this->__alloc(),
                        ^
test/scheduler_tests.cpp:89:22: note: in instantiation of function template specialization 'std::__1::vector<std::__1::thread, std::__1::allocator<std::__1::thread> >::emplace_back<void (CScheduler::*)(), std::__1::reference_wrapper<CScheduler> >' requested here
        microThreads.emplace_back(&CScheduler::serviceQueue, std::ref(microTasks));
                     ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/type_traits:1043:5: note: '~__nat' has been explicitly marked deleted here
    ~__nat() = delete;
    ^

@paveljanik
Copy link
Contributor

Compile error fixed here.

@sipa
Copy link
Member

sipa commented Aug 31, 2016

Needs rebase.

@theuni theuni force-pushed the no-interrupt-threads branch from e25e118 to 9fb3b1c Compare September 1, 2016 00:07
@theuni
Copy link
Member Author

theuni commented Sep 1, 2016

rebased and squashed in the compile fix.

@laanwj
Copy link
Member

laanwj commented Sep 1, 2016

Nearly every thread uses separate interrupt/shutdown calls for exactly that reason.

There's another reason, and that is that interrupt is supposed to be an instant operation at the caller site, and shutdown is allowed to wait. This allows parallelism in the shutdown process:

  • Send module A interrupt (A starts terminating its threads, which may take some time)
  • Send module B interrupt (B starts terminating its threads, which may take some time)

...

  • Module A shutdown (waits for A's threads to join)
  • Module B shutdown (waits for B's threads to join)

Sure, the naming may not be optimal, "interrupt" is more like a hint to stop work, but I think the approach makes sense.

(I'm sure this can be implemented more thoroughly with explicit shutdown-dependencies and std::futures, but the idea is the same, to make shutdown as fast as possible)

Edit: re-triggered travis, the failure was a timeout, probably has nothing to do with the changes here.

@laanwj
Copy link
Member

laanwj commented Sep 1, 2016

Travis does keep failing. There may be an issue after all.

@theuni
Copy link
Member Author

theuni commented Sep 1, 2016

Hmm, travis was happy before the rebase. I'll try to reproduce locally, maybe something changed.

@theuni theuni force-pushed the no-interrupt-threads branch 2 times, most recently from 0115d72 to c15e11b Compare September 2, 2016 05:01
@theuni
Copy link
Member Author

theuni commented Sep 2, 2016

The travis failures were caused by the rebase after all. ab48c5e introduced a new test that relied on the interruptible semantics. I've added a commit that should fix that up.

@theuni theuni force-pushed the no-interrupt-threads branch 2 times, most recently from 789eaff to 8a3bdd5 Compare September 2, 2016 06:09
@btcdrak
Copy link
Contributor

btcdrak commented Sep 2, 2016

needs rebase again :(

@@ -120,6 +120,12 @@ boost::condition_variable messageHandlerCondition;
static CNodeSignals g_signals;
CNodeSignals& GetNodeSignals() { return g_signals; }

static std::condition_variable net_interrupt_cond;
static std::mutex cs_net_interrupt;
static std::atomic<bool> net_interrupted(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK -- initially, I was confused as to why you use an atomic var for the condition variable, as usually access is guarded by the lock so atomic would be overkill. But in this case, it's fine.

May be worth writing a profile at some point for if it is better to use a cond+atomic or to lock on writing the condition externally, or just throwing a comment in here as to why you're using an atomic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's not at all clear from the individual commits. The atomic is used as the break condition for a few condvars, as well as the loop break condition in a few places. Rather than using separate bools for each, I just lumped them all into this one.

Will add a comment.

@JeremyRubin
Copy link
Contributor

@theuni I've gone through and I'll give you a "tentative" utack. Mostly looks fine.

I'd suggest mostly:

  • checking typedefs which secretly use boost::*
  • Getting rid of the exception throwing interruption point. It's a bad default.
  • Getting rid of std::bind in favor of lambdas.
  • depending on status of my wip scriptcheck queue stuff, leaving those two commits out.

@theuni
Copy link
Member Author

theuni commented Sep 6, 2016

@JeremyRubin Thanks for the review! To your points:

  • I agree somewhat on the typedefs, but I'd like to avoid this PR getting bigger and bigger. So let's do these case-by-case.
  • I don't think we can get rid of the exception throwing yet. I hate the interruption_point logic, but I don't think it's realistic to remove it all in one PR. I certainly agree with minimizing it though.
  • Agree on losing the binds where possible
  • Without the scriptcheck changes, this can't be tested as we can't shutdown. So I'm afraid we'll have to step on eachother's toes until one is merged. Obviously I'm happy to drop the scriptcheck changes and adapt to the new stuff if it goes in first.

@JeremyRubin
Copy link
Contributor

@theuni The only one I think you need to fix for this pr is CConditionVariable. Were there others you're worried about?

No worries about the scriptcheck stuff, we'll cross that bridge when it comes.

@theuni
Copy link
Member Author

theuni commented Sep 6, 2016

@JeremyRubin Fixing up CConditionVariable means doing lots of mutex/lock replacements all over the place. I'd really rather do that as a follow-up to keep this one minimal.

Now that we're not using any boost::thread's, I don't believe there's any major functional difference.

@JeremyRubin
Copy link
Contributor

@theuni sounds fine, this one is nuking boost::thread not boost::condition_variable after all :)

@btcdrak
Copy link
Contributor

btcdrak commented Sep 17, 2016

needs rebase

@@ -294,6 +294,8 @@ int CommandLineRPC(int argc, char *argv[])
catch (const boost::thread_interrupted&) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still need to catch boost::thread_interrupted? (another one below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, see "future changes" in the OP. There are a bunch of these to get rid of. I guess it doesn't make sense to do that later, I'll add it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yes you mention it there. It just looked strange to me to add something instead of replace.

@fanquake
Copy link
Member

fanquake commented Nov 6, 2016

Concept ACK. Needs a rebase. Tagging for 0.14.0

@fanquake fanquake added this to the 0.14.0 milestone Nov 6, 2016
@fanquake
Copy link
Member

There also some comments/examples about using boost thread in scheduler that could be nuked.

@theuni
Copy link
Member Author

theuni commented Dec 6, 2016

Closing for now, need to get a specific chunk of this in first.

@theuni theuni closed this Dec 6, 2016
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants