-
Notifications
You must be signed in to change notification settings - Fork 37.7k
log: Mitigate disk filling attacks by rate limiting LogPrintf #21603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In response to @jnewbery's comment
I disagree on the global schedding, i think the source location based schedding is over all the better approach. (This is obviously mostly an opinion and I am totally willing to be convinced of global log-schedding if enough reviewers favor it) |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
4648b6d
to
fa114ff
Compare
Right, and those cases are exactly where we need the best possible logs.
I guarantee you that not everyone will know this and most people will miss a single log line saying "rate-limiting log x". I can only speak from my own experience of supporting network equipment for several years - partial logs are often worse than no logs at all since they're so misleading. |
Strongest possible Concept ACK: this mitigation will kill an entire bug class (the "disk fill via logging" bug class). Thanks for taking up this work @dergoegge. Will test and review. |
Clarifying questions to fully understand what is suggested: Let's assume that log location A is a misplaced In the case of an attacker using log location A to fill our disk by making us log from there repeatedly, is the suggestion then that all log locations should be suppressed during the suppression period (instead of suppressing only the "attacker controlled" log location A)? To make your suggestion clear: could you exemplify what the special log entry at the end of the log suppression period would look like? (A patch or even a separate PR would be even better, but an example log entry would probably clarify enough!) |
3903f9b
to
1653167
Compare
1653167
to
9ab5de9
Compare
Thanks for your work on this PR! Would you be willing to implement also @jnewbery's suggestions as a separate PR? I tried to summarise his suggestion in #21603 (comment). I'm not certain I got it right though - @jnewbery, feel free to chime in :) I think it makes sense to do it as separate PR since it deviates in important ways from the original suggestion. By having two separate PRs we would see which of the two approaches to address this type of attack that have consensus support. Personally I think I could live with both approaches: as long as we kill the disk-fill-via-logging bug class sooner rather than later I'm happy :) |
@practicalswift Sure, thats a good idea :) |
9ab5de9
to
3afeed3
Compare
concept ACK (not that I'm sure if my opinion is relevant) and I think this makes more sense than the global alternative. |
🕵️ @sipa has been requested to review this pull request as specified in the REVIEWERS file. |
3afeed3
to
d192edc
Compare
@dergoegge Thanks for working on this. Would you mind rebasing? I would like to review the updated version :) |
d192edc
to
855d05e
Compare
To mitigate disk filling attacks caused by unsafe usages of LogPrintf, we rate limit LogPrintf by using the fixed window rate limiter (BCLog::LogRatelimiter) introduced in an earlier commit. The rate limiting logic is applied per source location instead of globally. A source location is allowed to log up to 1 MiB per hour. Source locations that violate the limit will have their logs supressed for up to one hour.
We mark ~DebugLogHelper as noexcept(false) to be able to catch the exception it throws. This lets us use it in test in combination with BOOST_CHECK_THROW and BOOST_CHECK_NO_THROW to check that certain log messages are (not) logged.
a6377c5
to
c86d5ff
Compare
@jonatack IMO removing some of the |
What I am working on is that no logging on by default would be externally provokable (and potentially removing LogPrintf). That along with reducing default logging in general has appeared to be the desired direction for some time now AFAICS. Of course, this could still be merged in the interim and then removed if no longer needed, but I'm targeting the next release. |
reACK c86d5ff |
With bd971bf, category-based logging with a severity |
Yes, doing precisely this right now for all of them. Though, assuming people are in favor of that, maybe reviewers might prefer extending this rate limiting to non-default logging or otherwise it can always still be removed. |
Is it possible to convert all of them? I think currently we don't have a way to express a log category but unconditional log severity. Unless you want to use |
@MarcoFalke added that in #25203. |
Good point, I think that this would definitely be useful (here or as a follow-up) - having a rate limit only for |
I agree; though as an example just updated the (copious) net processing logging in #25203 and there are a only a few remaining unconditional messages (user-controlled or unusual):
|
Also agree with this and i want to add that to this PR. The two new categories I introduce here ( |
🐙 This pull request conflicts with the target branch and needs rebase. Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft". |
So, it seems to me that with the recent changes, we currently have
The "missing" combination of logging by level but without a category is currently not possible as far as I can see. If there would be a use case for having "Always" logged messages also to have a severity like "Warning" or "Error" we'd need another dimension - but I think that at least for the current uses this is not the case, so your suggestion makes sense to me. However, I'm really not sure how stable the current state of the changes to the logging framework is, it seems very much in flow and I'm a little bit confused about what direction it is heading:
|
If there is a log category with |
I don't see how - the "Always" level is meant for noisy unconditional logs that can hit the rate limit, but are so important that they should be exempt from it - basically only |
I'm about to push an update to #25203 to incorporate #25287 and the changes discussed in #25306, and the current approach is that the |
The logging you might be worried about limiting would be, with the current direction, IIRC:
|
Haven't had the time to maintain this. I still think some general mitigation for disk-filling would be nice, so I might pick this up again later unless someone else beats me to it. Closing for now. |
This picks up #19995 in an attempt to solve #21559.
The goal of this PR is to mitigate disk filling attacks by temporarily rate limiting
LogPrintf(…)
.A disk fill attack is an attack where an untrusted party (such as a peer) is able to cheaply make your node log to disk excessively. The excessive logging may fill your disk and thus make your node crash either cleanly (best case: if disk fill rate is relatively slow) or uncleanly (worst case: if disk fill rate is relatively fast).
Approach
The hourly logging quota is set per source location. Every single source location (say net_processing.cpp:418) gets a quota of 1 MB of logging per hour. Logging is only dropped from source locations that exceed the quota.
LogPrintf(…)
is rate limited.LogPrint(category, …)
(-debug
) is not rate limited.UpdateTip: new best=[…]
is logged usingLogPrintfWithoutRateLimiting(…)
to avoid rate limiting. High log volume is expected for that source location during IBD.-ratelimitlogging
config option.[*]
if there is at least one source location that is currently being suppressed.Alternatives
In a recent PR Review Club meeting we discussed this PR and evaluated alternative approaches.
Global rate limiting
Global rate limiting
There was some discussion around an alternative approach which would be to globally rate limit
LogPrintf(…)
. This approach was implemented in a competing PR #21706. The main point [1] in favor of global rate limiting is that it could be confusing to have incomplete logs when logs from one source location are dropped but not from others. The main point against global rate limiting is that it opens another attack vector where an attacker could trigger rate limiting and then execute a 2. attack which would then not be document in the logs at all. With regard to the attack vector on the global approach and the overall reviews the two approaches have gotten I have chosen to continue with the approach in this PR.(To address the concern of [1] i have chosen to prefix all logs with
[*]
if there is at least one source location that is currently being suppressed.)logrotate
logrotate
LarryRuane brought up logrotate which can be used on linux to monitor log files and perform actions (delete, compress, send somewhere, etc.) on them when a certain size or point in time is reached.
logrotate
could be used to rate limit log but since we would like log rate limiting to be enabled by default it is not the best solution. Also on windows an alternative would have to be used.Testing
I have a written a unit test which is contained in this PR. (Not the greatest test, it currently does not work on windows thanks to
\r\n
. If someone has an idea how to improve the test i would appreciate it)Further more I made a RPC here that can log excessive amounts of "a"s from different locations which i have found useful during my own testing.
bitcoin-cli excessivelog <location (1-5)> <num_bytes>
bitcoin-cli excessivelog 1 536870912
would still log ~512MiB to disk. Logging to console is also never suppressed (unless -printtoconsole=0)A simple example to use the rpc: