Prevent file descriptor exhaustion from too many RPC calls #27731

fjahr · 2023-05-23T23:20:19Z

Fixes #11368

This requires libevent 2.2.1 which so far was only released in alpha.

For more context see specifically #11368 (comment), this uses the feature mentioned there as libevent#592.

DrahtBot · 2023-05-23T23:20:22Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/27731.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

No conflicts as of last run.

willcl-ark · 2023-05-24T10:09:39Z

src/httpserver.cpp

+        // Limit the maximum number of open connections to prevent exhausting
+        // the file descriptor limit. When the http server gets overwhelmed it
+        // will respond with 503 Service Unavailable.
+        evhttp_set_max_connections(http, workQueueDepth * 2);


I am not sure this actually limits us within the file descriptor limit. For example if you run:

$ ulimit -n 175 $ src/bitcoind Warning: Reducing -maxconnections from 125 to 15, because of system limitations. Bitcoin Core starting

I think (but have not tested) that if I then recieve 15 inbound connections I would could still run out of file descriptors for the http server?

Perhaps if we also modify this line:

bitcoin/src/init.cpp

Line 943 in 51c0507

nMaxConnections = std::min(nFD - MIN_CORE_FILEDESCRIPTORS - MAX_ADDNODE_CONNECTIONS - NUM_FDS_MESSAGE_CAPTURE, nMaxConnections);

...to consider the workqueue depth too, then it could work?

Yeah, I think it will be hard to find a number where there isn't a specific limit and specific circumstances that would still cause problems. If we set the limit dynamically based on what's really available there would be a range where we would effectively set evhttp_set_max_connections(http, 0); which would probably mean that we couldn't even shut down the server via the RPC interface.

When the maxconnections are reduced to 15 that doesn't mean that there are only 15 file descriptors left. There are more reserved than are not always needed. But that part of the code certainly could use improvement, see #27539 and #27730. I think making changes to the maxconnections calculation is not necessarily linked to this change here because it could reduce the risk of running out of FDs regardless. Also, note that the node will even start if you set ulimit -n 150 and it have maxconnection at 0. So that is another question to discuss if that even makes sense. Because if we alter the calculation of maxconnections but still let the node run regardless of what the result is, we haven't really prevented the file descriptor exhaustion completely. And it may be too limiting on the other hand if we set the requirements of available file descriptors so high that we can absolutely be sure the node will not encounter an exhaustion under any circumstances.

So, I am not sure what would be the optimal number to use. I think we can still tweak this later so it covers the most common use cases best. What I am currently concerned about is if this actually improves the situation in #11368 for the default file descriptor limits in the most commonly used OS. macOS was among the lowest as far as I remember and it was 256 usually I think. So I am currently aiming at showing that it's an improvement for that. But I do struggle to reproduce the original issue on my new machine right now so I am not 100% sure anymore if this actually needs fixing. Though I am on a different machine now than I was back then.

luke-jr · 2023-06-24T01:06:56Z

src/httpserver.cpp

@@ -419,6 +419,15 @@ bool InitHTTPServer()
    int workQueueDepth = std::max((long)gArgs.GetIntArg("-rpcworkqueue", DEFAULT_HTTP_WORKQUEUE), 1L);
    LogPrintfCategory(BCLog::HTTP, "creating work queue of depth %d\n", workQueueDepth);

+#if LIBEVENT_VERSION_NUMBER >= 0x02020001
+    if (event_get_version_number() >= 0x02020001) {


Won't runtime linking fail to find the evhttp_set_max_connections symbol?

Hm, how would that happen? I didn't have any issues when testing this code with 2.2.1-alpha and 2.1.12-stable.

I'm thinking of the case where you build with 2.2.1, then try to run it with 2.1.12. (If that's not supported, the runtime check is unnecessary)

luke-jr · 2023-09-15T18:08:24Z

doc/JSON-RPC-interface.md

-same time if this is under your control. It is hard to give general advice
-since this depends on your system but if you make several hundred requests at
-once you are definitely at risk of encountering this issue.
+When too many connections are opened quickly the interface will start to


This loses the correct docs for built-against-older-libevent

DrahtBot · 2024-02-05T14:34:46Z

🤔 There hasn't been much activity lately and the CI seems to be failing.

If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn't been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

DrahtBot · 2024-05-05T00:14:59Z

🤔 There hasn't been much activity lately and the CI seems to be failing.

If no one reviewed the current pull request by commit hash, a rebase can be considered. While the CI failure may be a false positive, the CI hasn't been running for some time, so there may be a real issue hiding as well. A rebase triggers the latest CI and makes sure that no silent merge conflicts have snuck in.

fjahr · 2024-05-05T03:40:11Z

Pushed a rebase though currently I am working on removing libevent as a dependency which should be a quicker solution than waiting for 2.2.1 :) I will address the feedback if 2.2.1 lands anytime soon though!

maflcko · 2024-05-06T08:35:20Z

removing libevent

Nice. Is there a tracking issue?

fjahr · 2024-05-29T14:33:26Z

removing libevent

Nice. Is there a tracking issue?

Not yet, but should be open soon (TM).

VzxPLnHqr · 2025-01-30T20:15:45Z

I am not familiar with the intricacies of this PR, but is it possible for bitcoind to detect this impending crash before it happens and just return some sort of 500 error?

I ran into this issue (using v28.0) when running a straightforward script which was calling getblock for each block, starting from genesis. After a couple thousand blocks bitcoind crashed with the file descriptor error.

After some trial and error I have found that putting a 100 millisecond delay between http requests at least has kept bitcoind from crashing (on my system with very standard commodity hardware), but of course this is very slow and will take days to run. lsof -u bitcoin | wc -l seems to hover between 1100 and 1500 files open for me.

Reading through this issue, it seems we are either waiting for bitcoin core to upgrade to using libevent 2.2.1, or we are waiting for libevent to be removed entirely as a dependency.

However, in the interim it would be really very helpful if, instead of bitcoind crashing, it at least stops responding to the rpc calls (lets the client timeout) or returns some 500 error. Then clients can at least know that they are overwhelming the server and backoff.

fjahr · 2025-01-31T18:46:44Z

Reading through this issue, it seems we are either waiting for bitcoin core to upgrade to using libevent 2.2.1, or we are waiting for libevent to be removed entirely as a dependency.

I would open this PR for review if libevent 2.2.1 was available but it's in Alpha since May 2023: https://github.com/libevent/libevent/releases So upgrading libevent isn't really an option yet but the removal is in the works, see #31194

However, in the interim it would be really very helpful if, instead of bitcoind crashing, it at least stops responding to the rpc calls (lets the client timeout) or returns some 500 error. Then clients can at least know that they are overwhelming the server and backoff.

I don't think there is a simple fix, there have been a few attempts over the years to address this issue outside of libevent but none have been merged and I would guess there wouldn't be considerable interest from reviewers given the parallel effort to remove libevent altogether. So I don't believe an interim solution would have a high chance to get merged given our limited resources for review, particular in this area of the code base (which is also a motivating factor for the removal of the dependency btw). I don't want to discourage anyone from trying to find a simple fix and opening a PR though.

fjahr mentioned this pull request May 23, 2023

bitcoin core crashes when too many rpc calls are made #11368

Closed

willcl-ark reviewed May 24, 2023

View reviewed changes

luke-jr reviewed Jun 24, 2023

View reviewed changes

DrahtBot mentioned this pull request Aug 9, 2023

doc: Use GitHub's "Alert" markdown syntax #28243

Closed

luke-jr reviewed Sep 15, 2023

View reviewed changes

DrahtBot added the CI failed label Oct 25, 2023

fjahr added 2 commits May 5, 2024 12:26

http: Use new max connection limit from libevent 2.2

5e8eccc

doc: Update limitations sections in RPC and REST interface docs

3a66391

fjahr force-pushed the 2023-05-fd-exhaust branch from 8b0d64d to 3a66391 Compare May 5, 2024 03:38

DrahtBot removed the CI failed label May 5, 2024

DrahtBot added CI failed and removed CI failed labels Sep 12, 2024

DrahtBot added the CI failed label Mar 14, 2025

DrahtBot mentioned this pull request Mar 14, 2025

Replace libevent with our own HTTP and socket-handling implementation #32061

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent file descriptor exhaustion from too many RPC calls #27731

Prevent file descriptor exhaustion from too many RPC calls #27731

fjahr commented May 23, 2023

Uh oh!

DrahtBot commented May 23, 2023 •

edited

Loading

Uh oh!

willcl-ark May 24, 2023

Uh oh!

fjahr May 24, 2023

Uh oh!

luke-jr Jun 24, 2023

Uh oh!

fjahr Jun 28, 2023

Uh oh!

luke-jr Jul 2, 2023

Uh oh!

luke-jr Sep 15, 2023

Uh oh!

DrahtBot commented Feb 5, 2024

Uh oh!

DrahtBot commented May 5, 2024

Uh oh!

fjahr commented May 5, 2024 •

edited

Loading

Uh oh!

maflcko commented May 6, 2024

Uh oh!

fjahr commented May 29, 2024

Uh oh!

VzxPLnHqr commented Jan 30, 2025

Uh oh!

fjahr commented Jan 31, 2025

Uh oh!

Uh oh!

Prevent file descriptor exhaustion from too many RPC calls #27731

Are you sure you want to change the base?

Prevent file descriptor exhaustion from too many RPC calls #27731

Conversation

fjahr commented May 23, 2023

Uh oh!

DrahtBot commented May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage & Benchmarks

Reviews

Conflicts

Uh oh!

willcl-ark May 24, 2023

Choose a reason for hiding this comment

Uh oh!

fjahr May 24, 2023

Choose a reason for hiding this comment

Uh oh!

luke-jr Jun 24, 2023

Choose a reason for hiding this comment

Uh oh!

fjahr Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

luke-jr Jul 2, 2023

Choose a reason for hiding this comment

Uh oh!

luke-jr Sep 15, 2023

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Feb 5, 2024

Uh oh!

DrahtBot commented May 5, 2024

Uh oh!

fjahr commented May 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented May 6, 2024

Uh oh!

fjahr commented May 29, 2024

Uh oh!

VzxPLnHqr commented Jan 30, 2025

Uh oh!

fjahr commented Jan 31, 2025

Uh oh!

Uh oh!

DrahtBot commented May 23, 2023 •

edited

Loading

fjahr commented May 5, 2024 •

edited

Loading