-
Notifications
You must be signed in to change notification settings - Fork 37.7k
http: Track active requests and wait for last to finish - 2nd attempt #26742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
Running running before changes f3bc1a7
32.50 seconds to run test running test with these changes applied 40528e7
1.2 seconds to run test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK - thanks for reviving this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
Concept ACK |
40528e7
to
b9b8b5d
Compare
77a6ea8
to
f95063e
Compare
retACK f95063e with |
From #15363 (comment):
Suggesting to add a test which will prevent any regressions in the future. |
@hebasto like
|
I have updated the test and it seems to be reliably failing on master, so I think this is a good enough regression test @hebasto ? Also addressed other comments by @hebasto . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK 6d3a848.
src/httpserver.cpp
Outdated
WAIT_LOCK(g_requests_mutex, lock); | ||
g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) { | ||
if (!g_requests.empty()) { | ||
LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having side effects in the predicate seems a bad idea due to spurious wakeups. It should be a pure function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, then I think this isn't possible anymore so I am reverting back to how I did it before. I think that's fine though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK 6d3a848
Can anyone please confirm or correct my understanding of why this so much faster?
My current understanding is that even when all of the workers in g_thread_http_workers
have terminated (i.e. the entire queue is processed and running==false
), the http server thread g_thread_http
does not return (when joining) immediately because there are still some idle connections (with whom?) open, preventing the server from shutting down. By first (instead of afterwards) calling evhttp_free(eventHTTP)
, we can safely destroy the server even if connections are still open (but all requests are handled), causing g_thread_http.join()
to return immediately in StopHTTPServer()
as opposed to waiting for a timeout.
I think the commit and PR message would benefit from a more clear reason why this works.
// Track requests and notify when a request is completed. | ||
{ | ||
WITH_LOCK(g_requests_mutex, g_requests.insert(req)); | ||
g_requests_cv.notify_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this notification? I think we (currently) only need to monitor for completed requests, not new ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you see #26742 (comment) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't seen that, thanks for linking.
It's a minor concern, but I'm a bit worried that we're creating a very general contract ("g_requests_cv
will always notify for any changes to the g_requests
") on a global variable when it's enforced outside of g_requests
, when in the future we might easily introduce mutations on g_requests
without notifying the cv. A more scoped cv, such as g_requests_cv_handled
largely mitigates that.
But, again, no big concern, and it's not likely to cause any major problems should we miss something there. As you see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I think one approach to improve on this contract (and as a nice benefit, also encapsulate code a bit better - I think this improves readability) in the below diff.
(WaitUntilEmpty()
needs improvement because it shouldn't directly be logging about the HTTP server like that. I guess we also don't need a GlobalMutex
here.)
git diff on 55bbf04
diff --git a/src/httpserver.cpp b/src/httpserver.cpp
index b481ef093..90ef37bd5 100644
--- a/src/httpserver.cpp
+++ b/src/httpserver.cpp
@@ -134,6 +134,40 @@ struct HTTPPathHandler
HTTPRequestHandler handler;
};
+class RequestTracker
+{
+public:
+ typedef std::unordered_set<evhttp_request*> requests_t;
+private:
+ GlobalMutex m_requests_mutex;
+ std::condition_variable m_requests_cv;
+ requests_t m_requests GUARDED_BY(m_requests_mutex);
+public:
+ void Register(struct evhttp_request* req)
+ {
+ WITH_LOCK(m_requests_mutex, m_requests.insert(req));
+ m_requests_cv.notify_all();
+ }
+
+ size_t Remove(struct evhttp_request* req)
+ {
+ auto n{WITH_LOCK(m_requests_mutex, return m_requests.erase(req))};
+ m_requests_cv.notify_all();
+ return n;
+ }
+
+ void WaitUntilEmpty()
+ {
+ WAIT_LOCK(m_requests_mutex, lock);
+ if (!m_requests.empty()) {
+ LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", m_requests.size());
+ }
+ m_requests_cv.wait(lock, [this]() EXCLUSIVE_LOCKS_REQUIRED(m_requests_mutex) {
+ return m_requests.empty();
+ });
+ }
+};
+
/** HTTP module state */
//! libevent event loop
@@ -149,10 +183,8 @@ static GlobalMutex g_httppathhandlers_mutex;
static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
//! Bound listening sockets
static std::vector<evhttp_bound_socket *> boundSockets;
-//! Track active requests
-static GlobalMutex g_requests_mutex;
-static std::condition_variable g_requests_cv;
-static std::unordered_set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);
+static RequestTracker g_request_tracker;
+
/** Check if a network address is allowed to access the HTTP server */
static bool ClientAllowed(const CNetAddr& netaddr)
@@ -216,12 +248,9 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
{
// Track requests and notify when a request is completed.
{
- WITH_LOCK(g_requests_mutex, g_requests.insert(req));
- g_requests_cv.notify_all();
+ g_request_tracker.Register(req);
evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
- auto n = WITH_LOCK(g_requests_mutex, return g_requests.erase(req));
- assert(n == 1);
- g_requests_cv.notify_all();
+ assert(g_request_tracker.Remove(req) == 1);
}, nullptr);
}
@@ -477,15 +506,7 @@ void StopHTTPServer()
evhttp_del_accept_socket(eventHTTP, socket);
}
boundSockets.clear();
- {
- WAIT_LOCK(g_requests_mutex, lock);
- if (!g_requests.empty()) {
- LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
- }
- g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
- return g_requests.empty();
- });
- }
+ g_request_tracker.WaitUntilEmpty();
if (eventHTTP) {
// Schedule a callback to call evhttp_free in the event base thread, so
// that evhttp_free does not need to be called again after the handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but it's a significant change and this PR already has a pretty long history and several ACKs for the current approach. I would suggest you open a follow-up PR with this as an additional improvement.
Sounds good to me. I guess libevent doesn't close the idle connection as an optimization (less overhead if the client wants to send more requests) and if the client doesn't close them then we are seeing this issue. I would add that the overhead that we see in the functional test is the timeout we have configured on the http server ourselves here. |
Not really an optimization of libevent, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK. With this change evhttp_free
is called after all requests are completed. Calling evhttp_free
at this point will cause all active connections to close and nothing remains on the event base, and all RPC threads are ready to be joined. As mentioned above, this is useful to avoid hitting the idle timeout.
Note that shutdown still hangs while a request is in-flight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 55bbf04
I have some open suggestions, but I'm happy with the PR as it is.
// Track requests and notify when a request is completed. | ||
{ | ||
WITH_LOCK(g_requests_mutex, g_requests.insert(req)); | ||
g_requests_cv.notify_all(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I think one approach to improve on this contract (and as a nice benefit, also encapsulate code a bit better - I think this improves readability) in the below diff.
(WaitUntilEmpty()
needs improvement because it shouldn't directly be logging about the HTTP server like that. I guess we also don't need a GlobalMutex
here.)
git diff on 55bbf04
diff --git a/src/httpserver.cpp b/src/httpserver.cpp
index b481ef093..90ef37bd5 100644
--- a/src/httpserver.cpp
+++ b/src/httpserver.cpp
@@ -134,6 +134,40 @@ struct HTTPPathHandler
HTTPRequestHandler handler;
};
+class RequestTracker
+{
+public:
+ typedef std::unordered_set<evhttp_request*> requests_t;
+private:
+ GlobalMutex m_requests_mutex;
+ std::condition_variable m_requests_cv;
+ requests_t m_requests GUARDED_BY(m_requests_mutex);
+public:
+ void Register(struct evhttp_request* req)
+ {
+ WITH_LOCK(m_requests_mutex, m_requests.insert(req));
+ m_requests_cv.notify_all();
+ }
+
+ size_t Remove(struct evhttp_request* req)
+ {
+ auto n{WITH_LOCK(m_requests_mutex, return m_requests.erase(req))};
+ m_requests_cv.notify_all();
+ return n;
+ }
+
+ void WaitUntilEmpty()
+ {
+ WAIT_LOCK(m_requests_mutex, lock);
+ if (!m_requests.empty()) {
+ LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", m_requests.size());
+ }
+ m_requests_cv.wait(lock, [this]() EXCLUSIVE_LOCKS_REQUIRED(m_requests_mutex) {
+ return m_requests.empty();
+ });
+ }
+};
+
/** HTTP module state */
//! libevent event loop
@@ -149,10 +183,8 @@ static GlobalMutex g_httppathhandlers_mutex;
static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
//! Bound listening sockets
static std::vector<evhttp_bound_socket *> boundSockets;
-//! Track active requests
-static GlobalMutex g_requests_mutex;
-static std::condition_variable g_requests_cv;
-static std::unordered_set<evhttp_request*> g_requests GUARDED_BY(g_requests_mutex);
+static RequestTracker g_request_tracker;
+
/** Check if a network address is allowed to access the HTTP server */
static bool ClientAllowed(const CNetAddr& netaddr)
@@ -216,12 +248,9 @@ static void http_request_cb(struct evhttp_request* req, void* arg)
{
// Track requests and notify when a request is completed.
{
- WITH_LOCK(g_requests_mutex, g_requests.insert(req));
- g_requests_cv.notify_all();
+ g_request_tracker.Register(req);
evhttp_request_set_on_complete_cb(req, [](struct evhttp_request* req, void*) {
- auto n = WITH_LOCK(g_requests_mutex, return g_requests.erase(req));
- assert(n == 1);
- g_requests_cv.notify_all();
+ assert(g_request_tracker.Remove(req) == 1);
}, nullptr);
}
@@ -477,15 +506,7 @@ void StopHTTPServer()
evhttp_del_accept_socket(eventHTTP, socket);
}
boundSockets.clear();
- {
- WAIT_LOCK(g_requests_mutex, lock);
- if (!g_requests.empty()) {
- LogPrint(BCLog::HTTP, "Waiting for %d requests to stop HTTP server\n", g_requests.size());
- }
- g_requests_cv.wait(lock, []() EXCLUSIVE_LOCKS_REQUIRED(g_requests_mutex) {
- return g_requests.empty();
- });
- }
+ g_request_tracker.WaitUntilEmpty();
if (eventHTTP) {
// Schedule a callback to call evhttp_free in the event base thread, so
// that evhttp_free does not need to be called again after the handling
This was made obsolete by tracking the active requests and explicitly waiting for them to finish before shutdown.
55bbf04
to
60978c8
Compare
Addressed comments by @promag and @stickies-v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-ACK 60978c8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 60978c8
|
||
#include <sys/types.h> | ||
#include <sys/stat.h> | ||
|
||
#include <event2/buffer.h> | ||
#include <event2/bufferevent.h> | ||
#include <event2/http.h> | ||
#include <event2/http_struct.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: IWYU suggests to s/#include <event2/http_struct.h>
/#include <event2/event.h>
/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I will fix this in the follow-up that will pick up #19434.
ACK 60978c8 |
… finish - 2nd attempt 60978c8 test: Reduce extended timeout on abortnode test (Fabian Jahr) 660bdbf http: Release server before waiting for event base loop exit (João Barbosa) 8c6d007 http: Track active requests and wait for last to finish (João Barbosa) Pull request description: This revives bitcoin#19420. Since promag is not so active at the moment, I can support this to finally get it merged. The PR is rebased and comments by jonatack have been addressed. Once this is merged, I will also reopen bitcoin#19434. ACKs for top commit: achow101: ACK 60978c8 stickies-v: re-ACK [60978c8](bitcoin@60978c8) hebasto: ACK 60978c8 Tree-SHA512: eef0fe1081e9331b95cfafc71d82f2398abd1d3439dac5b2fa5c6d9c0a3f63ef19adde1c38c88d3b4e7fb41ce7c097943f1815c10e33d165918ccbdec512fe1c
…lient disconnection 68f23f5 http: bugfix: track closed connection (stickies-v) 084d037 http: log connection instead of request count (stickies-v) 41f9027 http: refactor: use encapsulated HTTPRequestTracker (stickies-v) Pull request description: #26742 significantly increased the http server shutdown speed, but also introduced a bug (#27722 - see #27722 (comment) for steps to reproduce on master) that causes http server shutdown to halt in case of a remote client disconnection. This happens because `evhttp_request_set_on_complete_cb` is never called and thus the request is never removed from `g_requests`. This PR fixes that bug, and improves robustness of the code by encapsulating the request tracking logic. Earlier approaches (#27909, #27245, #19434) attempted to resolve this but [imo are fundamentally unsafe](#27909 (comment)) because of differences in lifetime between an `evhttp_request` and `evhttp_connection`. We don't need to keep track of open requests or connections, we just [need to ensure](#19420 (comment)) that there are no active requests on server shutdown. Because a connection can have multiple requests, and a request can be completed in various ways (the request actually being handled, or the client performing a remote disconnect), keeping a counter per connection seems like the approach with the least overhead to me. Fixes #27722 ACKs for top commit: vasild: ACK 68f23f5 theStack: ACK 68f23f5 Tree-SHA512: dfa711ff55ec75ba44d73e9e6fac16b0be25cf3c20868c2145a844a7878ad9fc6998d9ff62d72f3a210bfa411ef03d3757b73d68a7c22926e874c421e51444d6
…emote client disconnection 68f23f5 http: bugfix: track closed connection (stickies-v) 084d037 http: log connection instead of request count (stickies-v) 41f9027 http: refactor: use encapsulated HTTPRequestTracker (stickies-v) Pull request description: bitcoin#26742 significantly increased the http server shutdown speed, but also introduced a bug (bitcoin#27722 - see bitcoin#27722 (comment) for steps to reproduce on master) that causes http server shutdown to halt in case of a remote client disconnection. This happens because `evhttp_request_set_on_complete_cb` is never called and thus the request is never removed from `g_requests`. This PR fixes that bug, and improves robustness of the code by encapsulating the request tracking logic. Earlier approaches (bitcoin#27909, bitcoin#27245, bitcoin#19434) attempted to resolve this but [imo are fundamentally unsafe](bitcoin#27909 (comment)) because of differences in lifetime between an `evhttp_request` and `evhttp_connection`. We don't need to keep track of open requests or connections, we just [need to ensure](bitcoin#19420 (comment)) that there are no active requests on server shutdown. Because a connection can have multiple requests, and a request can be completed in various ways (the request actually being handled, or the client performing a remote disconnect), keeping a counter per connection seems like the approach with the least overhead to me. Fixes bitcoin#27722 ACKs for top commit: vasild: ACK 68f23f5 theStack: ACK 68f23f5 Tree-SHA512: dfa711ff55ec75ba44d73e9e6fac16b0be25cf3c20868c2145a844a7878ad9fc6998d9ff62d72f3a210bfa411ef03d3757b73d68a7c22926e874c421e51444d6
This revives #19420. Since promag is not so active at the moment, I can support this to finally get it merged.
The PR is rebased and comments by jonatack have been addressed.
Once this is merged, I will also reopen #19434.