Skip to content

Conversation

stickies-v
Copy link
Contributor

@stickies-v stickies-v commented Sep 29, 2023

#26742 significantly increased the http server shutdown speed, but also introduced a bug (#27722 - see #27722 (comment) for steps to reproduce on master) that causes http server shutdown to halt in case of a remote client disconnection. This happens because evhttp_request_set_on_complete_cb is never called and thus the request is never removed from g_requests.

This PR fixes that bug, and improves robustness of the code by encapsulating the request tracking logic. Earlier approaches (#27909, #27245, #19434) attempted to resolve this but imo are fundamentally unsafe because of differences in lifetime between an evhttp_request and evhttp_connection.

We don't need to keep track of open requests or connections, we just need to ensure that there are no active requests on server shutdown. Because a connection can have multiple requests, and a request can be completed in various ways (the request actually being handled, or the client performing a remote disconnect), keeping a counter per connection seems like the approach with the least overhead to me.

Fixes #27722

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 29, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK vasild, theStack
Concept ACK josibake, erikarvstedt
Approach ACK willcl-ark

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #28051 (Get rid of shutdown.cpp/shutdown.h, use SignalInterrupt directly by ryanofsky)
  • #27909 (http: add evhttp_connection_set_closecb to avoid g_requests hang by Crypt-iQ)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@fanquake fanquake added this to the 26.0 milestone Sep 29, 2023
@stickies-v stickies-v force-pushed the 2023-09/http-use-conn-counter branch from 97508b6 to 262421e Compare September 29, 2023 16:38
@stickies-v stickies-v force-pushed the 2023-09/http-use-conn-counter branch from 262421e to e3cd46c Compare September 29, 2023 16:51
@stickies-v
Copy link
Contributor Author

cc @vasild - to follow up on our conversation last week

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Haven't looked at the code changes yet, but can confirm that with this branch the problem as described in #27722 (comment) can't be reproduced anymore on my machine.

@fanquake fanquake requested a review from willcl-ark October 2, 2023 09:36
@fanquake
Copy link
Member

fanquake commented Oct 2, 2023

cc @Crypt-iQ @erikarvstedt

@josibake
Copy link
Member

josibake commented Oct 2, 2023

Concept ACK

@erikarvstedt
Copy link
Contributor

erikarvstedt commented Oct 2, 2023

Concept ACK

The nix-bitcoin test suite is now succeeding with this PR.

It seems there's no simpler strategy for implementing this:

  • libevent doesn't allow attaching user data to connections
  • @vasild's suggestion that avoids adding a connection tracking map doesn't work for multiple requests with a single connection (because only the last registered callback gets called).

Here's a small fixup that avoids a double map lookup in RemoveRequest.

@stickies-v stickies-v force-pushed the 2023-09/http-use-conn-counter branch from e3cd46c to af91f04 Compare October 2, 2023 20:32
@stickies-v
Copy link
Contributor Author

Here's a small fixup that avoids a double map lookup in RemoveRequest.

Excellent suggestion, thank you. Force pushed to incorporate this - otherwise no changes.

@erikarvstedt
Copy link
Contributor

@stickies-v stickies-v force-pushed the 2023-09/http-use-conn-counter branch from af91f04 to a86f3f4 Compare October 2, 2023 22:12
@stickies-v
Copy link
Contributor Author

fixup! AddRequest: simplify, avoid double map lookup

Makes sense, thank you. Force pushed to incorporate this - otherwise no changes.

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Light ACK a86f3f4

Code looks good and the bugfix approach seems solid (I'm not too familiar with libevent internals though).

Is there a simple way to trigger the "multiple-requests-per-connection" case (maybe with our functional test framework's AuthServiceProxy)? I've only tested the typical "one request per connection" scenario via bitcoin-cli so far.

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach ACK e3cd46c

@willcl-ark
Copy link
Member

Approach ACK

I started with a light code review but decided to make my own minimal diff (without the connection tracking) and ended up reimplementing #27909. I've now actually read all the context, including why the approach from 27909 is less desirable, and agree with the approach here.

Incidentally, I still cannot reproduce the original issue on my system with libevent version 2.1.12-stable-8ubuntu3. I tried with a remote bitcoin-cli to try and rule out some local networking-fu, but master and 25.0 still seem happy to shutdown for me no matter if there are remote-closed connections outstanding... FWIW This does mean that, without a test included, I can't actually "test" that this fix does something new.

However I can confirm that the changes here also work for me:

2023-10-03T09:35:35Z [http] Received a POST request for / from 127.0.0.1:51856
2023-10-03T09:35:38Z [http] Received a POST request for / from 127.0.0.1:60486
2023-10-03T09:35:38Z [http] Interrupting HTTP server
2023-10-03T09:35:38Z [http] Unregistering HTTP handler for / (exactmatch 1)
2023-10-03T09:35:38Z [http] Unregistering HTTP handler for /wallet/ (exactmatch 0)
2023-10-03T09:35:38Z [http] Stopping HTTP server
2023-10-03T09:35:38Z [http] Waiting for HTTP worker threads to exit
2023-10-03T09:35:38Z [http] Waiting for 3 connections to stop HTTP server
2023-10-03T09:35:38Z [http] Waiting for HTTP event thread to exit
2023-10-03T09:35:38Z [http] Exited http event loop
2023-10-03T09:35:38Z [http] Stopped HTTP server

I'd be happy to ACK this if the suggestion from @theStack was addressed.

@stickies-v stickies-v force-pushed the 2023-09/http-use-conn-counter branch from a86f3f4 to aab70f4 Compare October 3, 2023 11:21
Copy link
Contributor Author

@stickies-v stickies-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force pushed to incorporate all outstanding review comments, thank you @theStack @vasild @willcl-ark for your review.

Is there a simple way to trigger the "multiple-requests-per-connection" case (maybe with our functional test framework's AuthServiceProxy)? I've only tested the typical "one request per connection" scenario via bitcoin-cli so far.

I'll work on that, don't think it can be done with bitcoin-cli. I also would like to add some unit testing on HTTPRequestTracker, but from earlier work I've found that mocking evhttp_request objects proved to be a huge pain to get the memory sanitizers to pass, so I'm not sure I'll be able to get that working for this PR either (welcoming contributions there).

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aab70f4 LGTM, just missing thread safety annotations:

@vasild
Copy link
Contributor

vasild commented Oct 3, 2023

Is there a simple way to trigger the "multiple-requests-per-connection" case

I did this:

  1. Prepare /tmp/reqests:
POST / HTTP/1.1
Host: 127.0.0.1
Content-Type: application/json
Authorization: Basic X91hj28...
Content-Length: 37

{"method":"help","params":[],"id":1}
POST / HTTP/1.1
Host: 127.0.0.1
Content-Type: application/json
Authorization: Basic X91hj28...
Content-Length: 58

{"method":"waitforblockheight","params":[1000000],"id":1}

where X91hj28... is stolen beforehand using tcpdump from a normal bitcoin-cli help session.

  1. Then run nc 127.0.0.1 8332 < /tmp/requests.

This is doing two requests one after another and the second one hangs. I don't think it is possible to do two concurrent requests in one connection (at least not in HTTP 1.x).

Introduces and uses a HTTPRequestTracker class to keep track of
how many HTTP requests are currently active, so we don't stop the
server before they're all handled.

This has two purposes:
1. In a next commit, allows us to untrack all requests associated
with a connection without running into lifetime issues of the
connection living longer than the request
(see bitcoin#27909 (comment))

2. Improve encapsulation by making the mutex and cv internal members,
and exposing just the WaitUntilEmpty() method that can be safely
used.
There is no significant benefit in logging the request count instead
of the connection count. Reduces amount of code and computational
complexity.
It is possible that the client disconnects before the request is
handled. In those cases, evhttp_request_set_on_complete_cb is never
called, which means that on shutdown the server we'll keep waiting
endlessly.

By adding evhttp_connection_set_closecb, libevent automatically
cleans up those dead connections at latest when we shutdown, and
depending on the libevent version already at the moment of remote
client disconnect. In both cases, the bug is fixed.
@stickies-v stickies-v force-pushed the 2023-09/http-use-conn-counter branch from aab70f4 to 68f23f5 Compare October 3, 2023 12:37
@stickies-v
Copy link
Contributor Author

Force pushed to address thread safety annotations comment, thank you for the quick re-review!

Copy link
Contributor

@vasild vasild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 68f23f5

@DrahtBot DrahtBot requested a review from theStack October 3, 2023 12:39
Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 68f23f5

@@ -149,10 +151,68 @@ static GlobalMutex g_httppathhandlers_mutex;
static std::vector<HTTPPathHandler> pathHandlers GUARDED_BY(g_httppathhandlers_mutex);
//! Bound listening sockets
static std::vector<evhttp_bound_socket *> boundSockets;

/**
* @brief Helps keep track of open `evhttp_connection`s with active `evhttp_requests`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit (if you have to retouch):

Suggested change
* @brief Helps keep track of open `evhttp_connection`s with active `evhttp_requests`
* @brief Helps keep track of open `evhttp_connection`s with active `evhttp_request`s

@stickies-v
Copy link
Contributor Author

I don't think it is possible to do two concurrent requests in one connection (at least not in HTTP 1.x).

HTTP/1.1 supports HTTP pipelining, but libevent doesn't seem to support that. Pipelining is superseded by multiplexing in HTTP/2.0, which the libevent http server hasn't implemented (but seems possible to add on our side, not that I think we should at this point).

So, in practice it seems like our http server will currently never have > 1 active requests per connection, but I think we still need to keep the current accounting system (i.e. a counter per connection) because:

  • I find it difficult to prove that > 1 request per connection can never happen
  • it makes the existing code robust to changes in future libevent versions that do allow e.g. pipelining or multiplexing

@fanquake fanquake merged commit db7b5df into bitcoin:master Oct 4, 2023
@fanquake
Copy link
Member

fanquake commented Oct 4, 2023

Added to #28487 for backporting to 25.x.

@fanquake fanquake mentioned this pull request Oct 4, 2023
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Oct 4, 2023
Introduces and uses a HTTPRequestTracker class to keep track of
how many HTTP requests are currently active, so we don't stop the
server before they're all handled.

This has two purposes:
1. In a next commit, allows us to untrack all requests associated
with a connection without running into lifetime issues of the
connection living longer than the request
(see bitcoin#27909 (comment))

2. Improve encapsulation by making the mutex and cv internal members,
and exposing just the WaitUntilEmpty() method that can be safely
used.

Github-Pull: bitcoin#28551
Rebased-From: 41f9027
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Oct 4, 2023
There is no significant benefit in logging the request count instead
of the connection count. Reduces amount of code and computational
complexity.

Github-Pull: bitcoin#28551
Rebased-From: 084d037
fanquake pushed a commit to fanquake/bitcoin that referenced this pull request Oct 4, 2023
It is possible that the client disconnects before the request is
handled. In those cases, evhttp_request_set_on_complete_cb is never
called, which means that on shutdown the server we'll keep waiting
endlessly.

By adding evhttp_connection_set_closecb, libevent automatically
cleans up those dead connections at latest when we shutdown, and
depending on the libevent version already at the moment of remote
client disconnect. In both cases, the bug is fixed.

Github-Pull: bitcoin#28551
Rebased-From: 68f23f5
@willcl-ark
Copy link
Member

post-merge ACK based on git range-diff a86f3f46b786d1dd60ff55b987df784107008d11...68f23f57d77bc172ed39ecdd4d2d5cd5e13cf483

@stickies-v stickies-v deleted the 2023-09/http-use-conn-counter branch October 4, 2023 10:23
fanquake added a commit that referenced this pull request Oct 4, 2023
45a5fcb http: bugfix: track closed connection (stickies-v)
752a456 http: log connection instead of request count (stickies-v)
ae86ada http: refactor: use encapsulated HTTPRequestTracker (stickies-v)
f31899d gui: macOS, make appMenuBar part of the main app window (furszy)
64ffa94 gui: macOS, do not process dock icon actions during shutdown (furszy)
e270f3f depends: fix unusable memory_resource in macos qt build (fanquake)
a668394 build, macos: Fix `qt` package build with new Xcode 15 linker (Hennadii Stepanov)
b3517cb test: Test loading wallets with conflicts without a chain (Andrew Chow)
d63478c wallet: Check last block and conflict height are valid in MarkConflicted (Andrew Chow)
5e51a9c ci: Nuke Android APK task, Use credits for tsan (MarcoFalke)
910c362 test: ensure old fee_estimate.dat not read on restart and flushed (ismaelsadeeq)
37764d3 tx fees, policy: read stale fee estimates with a regtest-only option (ismaelsadeeq)
16bb916 tx fees, policy: do not read estimates of old fee_estimates.dat (ismaelsadeeq)
c4dd598 tx fees, policy: periodically flush fee estimates to fee_estimates.dat (ismaelsadeeq)
c36770c test: wallet, verify migration doesn't crash for an invalid script (furszy)
0d2a33e wallet: disallow migration of invalid or not-watched scripts (furszy)
2c51a07 Do not use std::vector = {} to release memory (Pieter Wuille)

Pull request description:

  Further backports for the `25.x` branch. Currently:
  * #27622
  * #27834
  * #28125
  * #28452
  * #28542
  * #28543
  * #28551
  * #28571
  * bitcoin-core/gui#751

ACKs for top commit:
  hebasto:
    re-ACK 45a5fcb, only #28551 has been backported with since my recent [review](#28487 (review)).
  dergoegge:
    reACK 45a5fcb
  willcl-ark:
    reACK 45a5fcb

Tree-SHA512: 0f5807aa364b7c2a2039fef11d5cd5e168372c3bf5b0e941350fcd92e7db4a1662801b97bb4f68e29788c77d24bbf97385a483c4501ca72d93fa25327d5694fa
Frank-GER pushed a commit to syscoin/syscoin that referenced this pull request Oct 13, 2023
…emote client disconnection

68f23f5 http: bugfix: track closed connection (stickies-v)
084d037 http: log connection instead of request count (stickies-v)
41f9027 http: refactor: use encapsulated HTTPRequestTracker (stickies-v)

Pull request description:

  bitcoin#26742 significantly increased the http server shutdown speed, but also introduced a bug (bitcoin#27722 - see bitcoin#27722 (comment) for steps to reproduce on master) that causes http server shutdown to halt in case of a remote client disconnection. This happens because `evhttp_request_set_on_complete_cb` is never called and thus the request is never removed from `g_requests`.

  This PR fixes that bug, and improves robustness of the code by encapsulating the request tracking logic. Earlier approaches (bitcoin#27909, bitcoin#27245, bitcoin#19434) attempted to resolve this but [imo are fundamentally unsafe](bitcoin#27909 (comment)) because of differences in lifetime between an `evhttp_request` and `evhttp_connection`.

  We don't need to keep track of open requests or connections, we just [need to ensure](bitcoin#19420 (comment)) that there are no active requests on server shutdown. Because a connection can have multiple requests, and a request can be completed in various ways (the request actually being handled, or the client performing a remote disconnect), keeping a counter per connection seems like the approach with the least overhead to me.

  Fixes bitcoin#27722

ACKs for top commit:
  vasild:
    ACK 68f23f5
  theStack:
    ACK 68f23f5

Tree-SHA512: dfa711ff55ec75ba44d73e9e6fac16b0be25cf3c20868c2145a844a7878ad9fc6998d9ff62d72f3a210bfa411ef03d3757b73d68a7c22926e874c421e51444d6
@bitcoin bitcoin locked and limited conversation to collaborators Oct 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bitcoind hangs waiting for g_requests.empty()
8 participants