-
Notifications
You must be signed in to change notification settings - Fork 37.7k
http: Join worker threads before deleting work queue #12366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 2fefa3077e69dfdbccbd010cda5627787ec92bb8. Untested, but change makes sense. Detached threads are always a little suspicious.
src/httpserver.cpp
Outdated
@@ -449,6 +449,7 @@ bool UpdateHTTPServerLogging(bool enable) { | |||
|
|||
std::thread threadHTTP; | |||
std::future<bool> threadResult; | |||
static std::vector<std::thread> threadHTTPWorkers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could follow g_thread_http_workers naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please first make sure that this fixes the issue before jumping on naming issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK 2fefa3077e69dfdbccbd010cda5627787ec92bb8 - my previous sleeps and valgrind are no longer able to find any issue here.
Would love to see @jnewbery confirm he cannot reproduce anymore as well.
src/httpserver.cpp
Outdated
@@ -460,8 +461,7 @@ bool StartHTTPServer() | |||
threadHTTP = std::thread(std::move(task), eventBase, eventHTTP); | |||
|
|||
for (int i = 0; i < rpcThreads; i++) { | |||
std::thread rpc_worker(HTTPWorkQueueRun, workQueue); | |||
rpc_worker.detach(); | |||
threadHTTPWorkers.emplace_back(std::thread(HTTPWorkQueueRun, workQueue)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to create the temporary just to move it into a new std::thread constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right
This prevents a potential race condition if control flow ends up in `ShutdownHTTPServer` before the thread gets to `queue->Run()`, deleting the work queue while workers are still going to use it. Meant to fix bitcoin#12362. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
src/httpserver.cpp
Outdated
@@ -487,6 +487,10 @@ void StopHTTPServer() | |||
if (workQueue) { | |||
LogPrint(BCLog::HTTP, "Waiting for HTTP worker threads to exit\n"); | |||
workQueue->WaitExit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need WaitExit() anymore with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would seem not, indeed. Testing shows no difference with it removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will remove it in a separate commit.
This function, which waits for all threads to exit, is no longer needed now that threads are joined instead. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
2fefa30
to
f946654
Compare
ok, renamed the variable and added a commit to remove WaitExit() |
I can confirm that I'm not able to reproduce the error when running with commit f946654. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK f946654. Only changes were variable rename and temp std::thread / WaitExit remove suggestions above.
void WaitExit() | ||
{ | ||
std::unique_lock<std::mutex> lock(cs); | ||
while (numThreads > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "http: Remove WaitExit from WorkQueue"
It looks like the numThreads variable is unused now, so it might be worth removing as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I removed those in a new commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK.
@@ -486,7 +478,10 @@ void StopHTTPServer() | |||
LogPrint(BCLog::HTTP, "Stopping HTTP server\n"); | |||
if (workQueue) { | |||
LogPrint(BCLog::HTTP, "Waiting for HTTP worker threads to exit\n"); | |||
workQueue->WaitExit(); | |||
for (auto& thread: g_thread_http_workers) { | |||
thread.join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the documentation, I don't think this is needed, clear()
below is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if you delete a thread without either calling detach() or join(), terminate() is called. join() is not automatically called for you.
utACK f946654. This is easier to read as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
The HTTP worker thread counter, as well as the RAII object that was used to maintain it, is unused now, so can be removed. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK 11e0151.
|
||
public: | ||
explicit WorkQueue(size_t _maxDepth) : running(true), | ||
maxDepth(_maxDepth), | ||
numThreads(0) | ||
maxDepth(_maxDepth) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, dunno about correct indentation here..
@@ -486,7 +457,10 @@ void StopHTTPServer() | |||
LogPrint(BCLog::HTTP, "Stopping HTTP server\n"); | |||
if (workQueue) { | |||
LogPrint(BCLog::HTTP, "Waiting for HTTP worker threads to exit\n"); | |||
workQueue->WaitExit(); | |||
for (auto& thread: g_thread_http_workers) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, space before :
.
@@ -449,6 +420,7 @@ bool UpdateHTTPServerLogging(bool enable) { | |||
|
|||
std::thread threadHTTP; | |||
std::future<bool> threadResult; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, this and the above could be static too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but out of scope of this PR to change that.
Nice cleanup! |
FWIW I wrote HTTPWorkQueue with the idea that it could likely be replaced with a 1-liner after c++11, I think there's scope for more code removal. Out of scope for a 0.16.0 fix ofc. |
11e0151 http: Remove numThreads and ThreadCounter (Wladimir J. van der Laan) f946654 http: Remove WaitExit from WorkQueue (Wladimir J. van der Laan) b1c2370 http: Join worker threads before deleting work queue (Wladimir J. van der Laan) Pull request description: This prevents a potential race condition if control flow ends up in `ShutdownHTTPServer` before the thread gets to `queue->Run()`, deleting the work queue while workers are still going to use it. Meant to fix #12362. Tree-SHA512: 8108514aeee5b2067a3736ed028014b580d1cbf8530ac7682b8a23070133dfa1ca21db4358c9158ea57e8811e0551395b6cb769887876b9cfce067ee968d0642
This prevents a potential race condition if control flow ends up in `ShutdownHTTPServer` before the thread gets to `queue->Run()`, deleting the work queue while workers are still going to use it. Meant to fix #12362. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Github-Pull: #12366 Rebased-From: b1c2370 Tree-SHA512: 7fbe8e562ba0e1138f95deb42e84cda734bef0d7058349728b3e1602b9f65777c5b1be0f6bd5a7e513ac38487dc82e62fd0a6ee4ed985cbf36c90a7eec18eced
This function, which waits for all threads to exit, is no longer needed now that threads are joined instead. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Github-Pull: #12366 Rebased-From: f946654 Tree-SHA512: 5cda90b4c081424d637031c0bbf168f177667733ff20b6f77eac84e503f7fad6fab3eb897f191edd819f18b270e3ecea78974978abd102d323f42d9d06216e53
The HTTP worker thread counter, as well as the RAII object that was used to maintain it, is unused now, so can be removed. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Github-Pull: #12366 Rebased-From: 11e0151 Tree-SHA512: 87055c4c14986973f4c1604db264fb5a9de21bb481e9d39b201774e2d17ed92a7d1617449471c13f56e0f1f09a8aebdf1254a71d6c7b856c880a5b71e0c3ba9d
This prevents a potential race condition if control flow ends up in `ShutdownHTTPServer` before the thread gets to `queue->Run()`, deleting the work queue while workers are still going to use it. Meant to fix bitcoin#12362. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Github-Pull: bitcoin#12366 Rebased-From: b1c2370 Tree-SHA512: 7fbe8e562ba0e1138f95deb42e84cda734bef0d7058349728b3e1602b9f65777c5b1be0f6bd5a7e513ac38487dc82e62fd0a6ee4ed985cbf36c90a7eec18eced
This function, which waits for all threads to exit, is no longer needed now that threads are joined instead. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Github-Pull: bitcoin#12366 Rebased-From: f946654 Tree-SHA512: 5cda90b4c081424d637031c0bbf168f177667733ff20b6f77eac84e503f7fad6fab3eb897f191edd819f18b270e3ecea78974978abd102d323f42d9d06216e53
The HTTP worker thread counter, as well as the RAII object that was used to maintain it, is unused now, so can be removed. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Github-Pull: bitcoin#12366 Rebased-From: 11e0151 Tree-SHA512: 87055c4c14986973f4c1604db264fb5a9de21bb481e9d39b201774e2d17ed92a7d1617449471c13f56e0f1f09a8aebdf1254a71d6c7b856c880a5b71e0c3ba9d Conflicts: src/httpserver.cpp
…ueue 11e0151 http: Remove numThreads and ThreadCounter (Wladimir J. van der Laan) f946654 http: Remove WaitExit from WorkQueue (Wladimir J. van der Laan) b1c2370 http: Join worker threads before deleting work queue (Wladimir J. van der Laan) Pull request description: This prevents a potential race condition if control flow ends up in `ShutdownHTTPServer` before the thread gets to `queue->Run()`, deleting the work queue while workers are still going to use it. Meant to fix bitcoin#12362. Tree-SHA512: 8108514aeee5b2067a3736ed028014b580d1cbf8530ac7682b8a23070133dfa1ca21db4358c9158ea57e8811e0551395b6cb769887876b9cfce067ee968d0642
…ueue 11e0151 http: Remove numThreads and ThreadCounter (Wladimir J. van der Laan) f946654 http: Remove WaitExit from WorkQueue (Wladimir J. van der Laan) b1c2370 http: Join worker threads before deleting work queue (Wladimir J. van der Laan) Pull request description: This prevents a potential race condition if control flow ends up in `ShutdownHTTPServer` before the thread gets to `queue->Run()`, deleting the work queue while workers are still going to use it. Meant to fix bitcoin#12362. Tree-SHA512: 8108514aeee5b2067a3736ed028014b580d1cbf8530ac7682b8a23070133dfa1ca21db4358c9158ea57e8811e0551395b6cb769887876b9cfce067ee968d0642
…ueue 11e0151 http: Remove numThreads and ThreadCounter (Wladimir J. van der Laan) f946654 http: Remove WaitExit from WorkQueue (Wladimir J. van der Laan) b1c2370 http: Join worker threads before deleting work queue (Wladimir J. van der Laan) Pull request description: This prevents a potential race condition if control flow ends up in `ShutdownHTTPServer` before the thread gets to `queue->Run()`, deleting the work queue while workers are still going to use it. Meant to fix bitcoin#12362. Tree-SHA512: 8108514aeee5b2067a3736ed028014b580d1cbf8530ac7682b8a23070133dfa1ca21db4358c9158ea57e8811e0551395b6cb769887876b9cfce067ee968d0642
aab15d7 ReplayBlocks: use find instead of brackets operator to access to the element. (furszy) e898353 [Refactoring] Use const CBlockIndex* where appropriate (random-zebra) c76fa04 qa: Extract rpc_timewait as test param (furszy) 0f832e3 shutdown: Stop threads before resetting ptrs (MarcoFalke) 67aebbf http: Remove numThreads and ThreadCounter (Wladimir J. van der Laan) e24c710 http: Remove WaitExit from WorkQueue (Wladimir J. van der Laan) b8f7364 http: Join worker threads before deleting work queue (Wladimir J. van der Laan) 7d68769 rpc: further constrain the libevent workaround (Cory Fields) 75af065 rpc: work-around an upstream libevent bug (Cory Fields) 50e5833 Always return true if AppInitMain got to the end (Matt Corallo) bd70dcc [qa] Test non-atomic chainstate writes (furszy) 8f04970 Dont create pcoinsTip until after ReplayBlocks. (Matt Corallo) 93f2b15 Random db flush crash simulator (Pieter Wuille) 72f3b17 Adapt memory usage estimation for flushing (Pieter Wuille) 8540113 Non-atomic flushing using the blockchain as replay journal (Pieter Wuille) 8d6625f [MOVEONLY] Move LastCommonAncestor to chain (Pieter Wuille) Pull request description: > This patch adds an extra "head blocks" to the chainstate, which gives the range of blocks for writes may be incomplete. At the start of a flush, we write this record, write the dirty dbcache entries in 16 MiB batches, and at the end we remove the heads record again. If it is present at startup it means we crashed during flush, and we rollback/roll forward blocks inside of it to get a consistent tip on disk before proceeding. > If a flush completes succesfully, the resulting database is compatible with previous versions. If the node crashes in the middle of a flush, a version of the code with this patch is needed to recovery. An adaptation of the following PRs with further modifications to the `feature_dbcrash.py` test to be up-to-date with upstream and solve RPC related bugs. * bitcoin#10148. * Increase RPC wait time. * bitcoin#11831 * bitcoin#11593 * bitcoin#12366 * bitcoin#13837 * bitcoin#13894 ACKs for top commit: random-zebra: ACK aab15d7 Fuzzbawls: ACK aab15d7 Tree-SHA512: 898806746f581a9eb8deb0155c558481abf4454c6f3b3c3ad505c557938d0700fe9796e98e36492286ae869378647072c3ad77ad65e9dd7075108ff96469ade1
This prevents a potential race condition if control flow ends up in `ShutdownHTTPServer` before the thread gets to `queue->Run()`, deleting the work queue while workers are still going to use it. Meant to fix #12362. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> zcash: cherry picked from commit b1c2370 zcash: bitcoin/bitcoin#12366
This function, which waits for all threads to exit, is no longer needed now that threads are joined instead. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> zcash: cherry picked from commit f946654 zcash: bitcoin/bitcoin#12366
The HTTP worker thread counter, as well as the RAII object that was used to maintain it, is unused now, so can be removed. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> zcash: cherry picked from commit 11e0151 zcash: bitcoin/bitcoin#12366
Bitcoin 0.16 locking PRs These are locking changes from upstream (bitcoin core) release 0.16 (Aug 14, 2017, to Feb 16, 2018), oldest to newest (when merged to the master branch). Each commit also includes a reference both to the PR and the upstream commit. - bitcoin/bitcoin#11126 - Excludes changes to wallet tests that we don't have. - bitcoin/bitcoin#10916 - first commit only; second commit already merged by d9fcc2b - bitcoin/bitcoin#11107 - Only the last commit. - bitcoin/bitcoin#11593 - bitcoin/bitcoin#11585 - bitcoin/bitcoin#11618 - bitcoin/bitcoin#10286 - Only the third and last commits. - bitcoin/bitcoin#11870 - bitcoin/bitcoin#12330 - bitcoin/bitcoin#12366 - bitcoin/bitcoin#12368 - bitcoin/bitcoin#12333 - Only the first commit.
This prevents a potential race condition if control flow ends up in
ShutdownHTTPServer
before the thread gets toqueue->Run()
,deleting the work queue while workers are still going to use it.
Meant to fix #12362.