log: Mitigate disk filling attacks by rate limiting LogPrintf #21603

dergoegge · 2021-04-05T12:47:33Z

This picks up #19995 in an attempt to solve #21559.

The goal of this PR is to mitigate disk filling attacks by temporarily rate limiting LogPrintf(…).

A disk fill attack is an attack where an untrusted party (such as a peer) is able to cheaply make your node log to disk excessively. The excessive logging may fill your disk and thus make your node crash either cleanly (best case: if disk fill rate is relatively slow) or uncleanly (worst case: if disk fill rate is relatively fast).

Approach

The hourly logging quota is set per source location. Every single source location (say net_processing.cpp:418) gets a quota of 1 MB of logging per hour. Logging is only dropped from source locations that exceed the quota.

Only logging to disk is rate limited. Logging to console is not rate limited.
Only LogPrintf(…) is rate limited. LogPrint(category, …) (-debug) is not rate limited.
UpdateTip: new best=[…] is logged using LogPrintfWithoutRateLimiting(…) to avoid rate limiting. High log volume is expected for that source location during IBD.
When logging is restarted a tally of how many messages/bytes were dropped is printed.
Log rate limiting can be disabled with the -ratelimitlogging config option.
All logs will be prefixed with [*] if there is at least one source location that is currently being suppressed.

Alternatives

In a recent PR Review Club meeting we discussed this PR and evaluated alternative approaches.

Global rate limiting

There was some discussion around an alternative approach which would be to globally rate limit LogPrintf(…). This approach was implemented in a competing PR #21706. The main point [1] in favor of global rate limiting is that it could be confusing to have incomplete logs when logs from one source location are dropped but not from others. The main point against global rate limiting is that it opens another attack vector where an attacker could trigger rate limiting and then execute a 2. attack which would then not be document in the logs at all. With regard to the attack vector on the global approach and the overall reviews the two approaches have gotten I have chosen to continue with the approach in this PR.
(To address the concern of [1] i have chosen to prefix all logs with [*] if there is at least one source location that is currently being suppressed.)

logrotate

LarryRuane brought up logrotate which can be used on linux to monitor log files and perform actions (delete, compress, send somewhere, etc.) on them when a certain size or point in time is reached. logrotate could be used to rate limit log but since we would like log rate limiting to be enabled by default it is not the best solution. Also on windows an alternative would have to be used.

Testing

I have a written a unit test which is contained in this PR. (Not the greatest test, it currently does not work on windows thanks to \r\n. If someone has an idea how to improve the test i would appreciate it)

Further more I made a RPC here that can log excessive amounts of "a"s from different locations which i have found useful during my own testing.
bitcoin-cli excessivelog <location (1-5)> <num_bytes>

⚠️One thing to note with that rpc is that the rate limiting logic still prints the last message that triggered the limiting to disk, so something like bitcoin-cli excessivelog 1 536870912 would still log ~512MiB to disk. Logging to console is also never suppressed (unless -printtoconsole=0) ⚠️

A simple example to use the rpc:

bitcoin-cli -regtest setmocktime 1
bitcoin-cli -regtest excessivelog 1 1048500 # log just under 1MiB
bitcoin-cli -regtest excessivelog 1 100 # this should get the total amount logged above 1MiB
                                        # and the rate limiting logic should kick in
bitcoin-cli -regtest setmocktime 3602 # let 1 hour pass (only works in regtest)
bitcoin-cli -regtest excessivelog 1 100 # this should trigger logging to resume

dergoegge · 2021-04-05T12:49:06Z

In response to @jnewbery's comment

I think I'd prefer log-shedding to be done globally, with a special log at the end of a log suppression period that tallies up which log locations have been suppressed.

I disagree on the global schedding, i think the source location based schedding is over all the better approach.
I agree with your point that it could be confusing to drop some logs in the context of user issues but schedding should only kick in if there is an attack or something is horribly wrong, so i would think that almost all logs provided by users would be complete. We can also always tell if rate limiting is taking place since we log when it starts/stops and in those cases a partial log is better then no log.

(This is obviously mostly an opinion and I am totally willing to be convinced of global log-schedding if enough reviewers favor it)

DrahtBot · 2021-04-05T15:08:12Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#25306 (logging: add LogPrintfCategory to log unconditionally with category by jonatack)
#25287 (logging: threshold log level by klementtan)
#25203 (logging: update to severity-based logging by jonatack)
#24950 (Add config option to set max debug log size by tehelsper)
#23443 (p2p: Erlay support signaling by naumenkogs)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

jnewbery · 2021-04-06T07:39:20Z

schedding should only kick in if there is an attack or something is horribly wrong, so i would think that almost all logs provided by users would be complete.

Right, and those cases are exactly where we need the best possible logs.

We can also always tell if rate limiting is taking place since we log when it starts/stops and in those cases a partial log is better then no log.

I guarantee you that not everyone will know this and most people will miss a single log line saying "rate-limiting log x".

I can only speak from my own experience of supporting network equipment for several years - partial logs are often worse than no logs at all since they're so misleading.

practicalswift · 2021-04-06T07:55:49Z

Strongest possible Concept ACK: this mitigation will kill an entire bug class (the "disk fill via logging" bug class).

Thanks for taking up this work @dergoegge.

Will test and review.

practicalswift · 2021-04-06T08:27:05Z

@jnewbery

I think I'd prefer log-shedding to be done globally, with a special log at the end of a log suppression period that tallies up which log locations have been suppressed.

Clarifying questions to fully understand what is suggested:

Let's assume that log location A is a misplaced LogPrintf in a code path that the attacker can make us take.

In the case of an attacker using log location A to fill our disk by making us log from there repeatedly, is the suggestion then that all log locations should be suppressed during the suppression period (instead of suppressing only the "attacker controlled" log location A)?

To make your suggestion clear: could you exemplify what the special log entry at the end of the log suppression period would look like? (A patch or even a separate PR would be even better, but an example log entry would probably clarify enough!)

src/test/logging_tests.cpp

practicalswift · 2021-04-16T04:09:33Z

@dergoegge

Thanks for your work on this PR!

Would you be willing to implement also @jnewbery's suggestions as a separate PR?

I tried to summarise his suggestion in #21603 (comment). I'm not certain I got it right though - @jnewbery, feel free to chime in :)

I think it makes sense to do it as separate PR since it deviates in important ways from the original suggestion. By having two separate PRs we would see which of the two approaches to address this type of attack that have consensus support.

Personally I think I could live with both approaches: as long as we kill the disk-fill-via-logging bug class sooner rather than later I'm happy :)

dergoegge · 2021-04-16T12:25:48Z

Would you be willing to implement also jnewbery's suggestions as a separate PR?

@practicalswift Sure, thats a good idea :)

rebroad · 2021-04-17T09:45:04Z

concept ACK (not that I'm sure if my opinion is relevant) and I think this makes more sense than the global alternative.

DrahtBot · 2021-05-03T09:32:28Z

🕵️ @sipa has been requested to review this pull request as specified in the REVIEWERS file.

practicalswift · 2021-06-12T08:04:55Z

@dergoegge Thanks for working on this. Would you mind rebasing? I would like to review the updated version :)

To mitigate disk filling attacks caused by unsafe usages of LogPrintf, we rate limit LogPrintf by using the fixed window rate limiter (BCLog::LogRatelimiter) introduced in an earlier commit. The rate limiting logic is applied per source location instead of globally. A source location is allowed to log up to 1 MiB per hour. Source locations that violate the limit will have their logs supressed for up to one hour.

We mark ~DebugLogHelper as noexcept(false) to be able to catch the exception it throws. This lets us use it in test in combination with BOOST_CHECK_THROW and BOOST_CHECK_NO_THROW to check that certain log messages are (not) logged.

…ations

dergoegge · 2022-05-30T14:13:25Z

It looks like many of the LogPrintf messages will be going away with the update to severity-based logging and the added complexity here may be worth reevaluating (and potentially avoiding) after that.

@jonatack IMO removing some of the LogPrintf locations (or assigning them a severity level which is not logged by default) does not really address the issue at its root like this PR is trying to do. Any remaining default logging could still cause issues as well as any logging that is introduced in the future.

jonatack · 2022-05-30T14:21:42Z

What I am working on is that no logging on by default would be externally provokable (and potentially removing LogPrintf). That along with reducing default logging in general has appeared to be the desired direction for some time now AFAICS. Of course, this could still be merged in the interim and then removed if no longer needed, but I'm targeting the next release.

jnewbery · 2022-05-30T14:36:56Z

reACK c86d5ff

mzumsande · 2022-05-31T15:29:46Z

What I am working on is that no logging on by default would be externally provokable (and potentially removing LogPrintf). That along with reducing default logging in general has appeared to be the desired direction for some time now AFAICS. Of course, this could still be merged in the interim and then removed if no longer needed, but I'm targeting the next release.

With bd971bf, category-based logging with a severity >= BCLog::Level::Warning is equivalent to LogPrintf from the viewpoint of disk filling attacks.
I think that it would be really good to have this general countermeasure in place, especially with probable future PRs that convert more existing log messages to the new format, because we'll now have to be careful about not adding category-based warnings or errors that could be triggered externally. The old rule, which is no longer valid but probably still in many contributors' heads was that one only needs to be careful about LogPrintf, while category-base logging is not so critical.

jonatack · 2022-05-31T16:51:32Z

convert more existing log messages to the new format, because we'll now have to be careful about not adding category-based warnings or errors that could be triggered externally

Yes, doing precisely this right now for all of them. Though, assuming people are in favor of that, maybe reviewers might prefer extending this rate limiting to non-default logging or otherwise it can always still be removed.

maflcko · 2022-05-31T17:04:05Z

Is it possible to convert all of them? I think currently we don't have a way to express a log category but unconditional log severity. Unless you want to use Warning/Error for "unconditional".

jonatack · 2022-05-31T17:08:03Z

@MarcoFalke added that in #25203.

mzumsande · 2022-05-31T18:43:50Z

Though, assuming people are in favor of that, maybe reviewers might prefer extending this rate limiting to non-default logging.

Good point, I think that this would definitely be useful (here or as a follow-up) - having a rate limit only for LogPrintf but not for unconditionally logged category-based warnings and errors would be confusing.

jonatack · 2022-05-31T18:52:22Z

Good point, I think that this would definitely be useful (here or as a follow-up) - having a rate limit only for LogPrintf but not for unconditionally logged category-based warnings and errors would be confusing.

I agree; though as an example just updated the (copious) net processing logging in #25203 and there are a only a few remaining unconditional messages (user-controlled or unusual):

src/net_processing.cpp:2305:                LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Large reorg, won't direct fetch to %s (%d)\n",
src/net_processing.cpp:2702:            LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Connected to self at %s, disconnecting\n", pfrom.addr.ToString());
src/net_processing.cpp:3425:                    LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Not relaying non-mempool transaction %s from forcerelay peer=%d\n", tx.GetHash().ToString(), pfrom.GetId());
src/net_processing.cpp:4190:        LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Not punishing misbehaving peer %d because it has noban permission set!\n", peer.m_id);
src/net_processing.cpp:4196:        LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Not punishing misbehaving peer %d because it was manually connected!\n", peer.m_id);
src/net_processing.cpp:4873:                        LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Announcing block %s not on main chain (tip=%s)\n",

dergoegge · 2022-06-06T12:35:54Z

having a rate limit only for LogPrintf but not for unconditionally logged category-based warnings and errors would be confusing.

Also agree with this and i want to add that to this PR.

The two new categories I introduce here (UNCONDITIONAL_ALWAYS and UNCONDITIONAL_RATE_LIMITED) don't really make sense anymore, given that we now also have category based unconditional logging. Whether a log location should be rate limited could purely be based on the severity level.
I was thinking of the following: Adding an extra severity Always which is logged unconditionally without rate limiting (e.g. for this location) and rate limiting logging to disk if severity == None || (severity >= Warning && severity < Always). Additionally, categories specified by -debug=cat should still be logged unconditionally without rate limiting. @mzumsande @jonatack @MarcoFalke what do y'all think of that?

DrahtBot · 2022-06-14T11:03:34Z

🐙 This pull request conflicts with the target branch and needs rebase.

_{Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".}

mzumsande · 2022-06-22T15:01:31Z

I was thinking of the following: Adding an extra severity Always which is logged unconditionally without rate limiting (e.g. for this location) and rate limiting logging to disk if severity == None || (severity >= Warning && severity < Always). Additionally, categories specified by -debug=cat should still be logged unconditionally without rate limiting. @mzumsande @jonatack @MarcoFalke what do y'all think of that?

So, it seems to me that with the recent changes, we currently have

Unconditional logging with no level and no category (LogPrintf)
Conditional logging with category but no level (LogPrint) - not relevant to this PR
Unconditional logging with category but no level (LogPrintfCategory)
Logging with category and level, which may be unconditional depending on the level (LogPrintLevel)

The "missing" combination of logging by level but without a category is currently not possible as far as I can see.

If there would be a use case for having "Always" logged messages also to have a severity like "Warning" or "Error" we'd need another dimension - but I think that at least for the current uses this is not the case, so your suggestion makes sense to me.

However, I'm really not sure how stable the current state of the changes to the logging framework is, it seems very much in flow and I'm a little bit confused about what direction it is heading:

The severity threshold mentioned in 4) was added in log: Use severity-based logging for leveldb/libevent messages, reverse LogPrintLevel order #25202, but after reading this comment by @laanwj I'm not sure it will stay.
Is the final goal for each logging message to have both a category and a level so that 1) and 2) and 3) would vanish over time? That would probably reduce the complexity of the rate-limiting considerably.

maflcko · 2022-06-22T15:34:03Z

If there is a log category with Info, which is also the default level, you wouldn't need an Always level?

mzumsande · 2022-06-23T01:11:40Z

If there is a log category with Info, which is also the default level, you wouldn't need an Always level?

I don't see how - the "Always" level is meant for noisy unconditional logs that can hit the rate limit, but are so important that they should be exempt from it - basically only UpdateTipLog during IBD.

jonatack · 2022-06-23T18:14:37Z

I'm about to push an update to #25203 to incorporate #25287 and the changes discussed in #25306, and the current approach is that the None severity level is only used internally, i.e. in src/logging.{h.cpp}, and moved to the end of the enum to permit iterating through it similar to GetNetworkNames() / NET_MAX.

jonatack · 2022-06-23T18:18:16Z

The logging you might be worried about limiting would be, with the current direction, IIRC:

LogPrintf
LogPrintfWithCategory
LogPrintLevel with level Info/Warning/Error (these levels are planned to be logged unconditionally, while Debug and Trace would not)

dergoegge · 2022-10-07T10:18:05Z

Haven't had the time to maintain this. I still think some general mitigation for disk-filling would be nice, so I might pick this up again later unless someone else beats me to it. Closing for now.

fanquake added the Utils/log/libs label Apr 5, 2021

DrahtBot mentioned this pull request Apr 5, 2021

Relog configuration args on debug.log rotation #16673

Closed

dergoegge force-pushed the log_ratelimiting branch from 4648b6d to fa114ff Compare April 5, 2021 23:28

adamjonas reviewed Apr 6, 2021

View reviewed changes

src/test/logging_tests.cpp Outdated Show resolved Hide resolved

dergoegge force-pushed the log_ratelimiting branch 2 times, most recently from 3903f9b to 1653167 Compare April 6, 2021 23:37

fanquake mentioned this pull request Apr 7, 2021

log: Mitigate disk filling attacks by temporarily rate limiting LogPrintf(…) #19995

Closed

DrahtBot mentioned this pull request Apr 8, 2021

validation: UpdateTip/CheckBlockIndex assumeutxo support #21526

Merged

adamjonas reviewed Apr 14, 2021

View reviewed changes

src/test/logging_tests.cpp Outdated Show resolved Hide resolved

dergoegge force-pushed the log_ratelimiting branch from 1653167 to 9ab5de9 Compare April 15, 2021 20:54

dergoegge force-pushed the log_ratelimiting branch from 9ab5de9 to 3afeed3 Compare April 16, 2021 15:19

dergoegge mentioned this pull request Apr 16, 2021

log: Mitigate disk filling attacks by globally rate limiting LogPrintf(…) #21706

Closed

DrahtBot mentioned this pull request Apr 20, 2021

MOVEONLY: Move common init code to init/common #21732

Merged

DrahtBot added the Needs rebase label Apr 23, 2021

dergoegge force-pushed the log_ratelimiting branch from 3afeed3 to d192edc Compare May 13, 2021 12:17

DrahtBot removed the Needs rebase label May 13, 2021

This was referenced May 28, 2021

assumeutxo #15606

Closed

[Bundle 7/7] validation: Farewell, global Chainstate! #21866

Merged

DrahtBot added the Needs rebase label Jun 12, 2021

dergoegge force-pushed the log_ratelimiting branch from d192edc to 855d05e Compare June 12, 2021 09:54

dergoegge added 6 commits May 30, 2022 15:17

[test util] Mark ~DebugLogHelper as noexcept(false)

02ad12c

We mark ~DebugLogHelper as noexcept(false) to be able to catch the exception it throws. This lets us use it in test in combination with BOOST_CHECK_THROW and BOOST_CHECK_NO_THROW to check that certain log messages are (not) logged.

[test] Check for expected log rate limiting messages

2702025

[test] Test for expected file size changes when rate limiting is enabled

79f7091

[test] Check that log rate limiting is disabled for exempt source loc…

b9261a8

…ations

[test] Check that rate limiting can be disabled

c86d5ff

dergoegge force-pushed the log_ratelimiting branch from a6377c5 to c86d5ff Compare May 30, 2022 13:50

This was referenced Jun 7, 2022

logging: threshold log level #25287

Closed

logging: add LogPrintfCategory to log unconditionally with category #25306

Merged

DrahtBot added the Needs rebase label Jun 14, 2022

dergoegge closed this Oct 7, 2022

mzumsande mentioned this pull request May 24, 2023

Frequent "Timeout downloading block" with 24.1 #27705

Closed

1 task

bitcoin locked and limited conversation to collaborators Oct 7, 2023

log: Mitigate disk filling attacks by rate limiting LogPrintf #21603

log: Mitigate disk filling attacks by rate limiting LogPrintf #21603

Uh oh!

Conversation

dergoegge commented Apr 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approach

Alternatives

Global rate limiting

logrotate

Testing

Uh oh!

dergoegge commented Apr 5, 2021

Uh oh!

DrahtBot commented Apr 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Conflicts

Uh oh!

jnewbery commented Apr 6, 2021

Uh oh!

practicalswift commented Apr 6, 2021

Uh oh!

practicalswift commented Apr 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

practicalswift commented Apr 16, 2021

Uh oh!

dergoegge commented Apr 16, 2021

Uh oh!

rebroad commented Apr 17, 2021

Uh oh!

DrahtBot commented May 3, 2021

Uh oh!

practicalswift commented Jun 12, 2021

Uh oh!

dergoegge commented May 30, 2022

Uh oh!

jonatack commented May 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnewbery commented May 30, 2022

Uh oh!

mzumsande commented May 31, 2022

Uh oh!

jonatack commented May 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented May 31, 2022

Uh oh!

jonatack commented May 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mzumsande commented May 31, 2022

Uh oh!

jonatack commented May 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dergoegge commented Jun 6, 2022

Uh oh!

DrahtBot commented Jun 14, 2022

Uh oh!

mzumsande commented Jun 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented Jun 22, 2022

Uh oh!

mzumsande commented Jun 23, 2022

Uh oh!

jonatack commented Jun 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatack commented Jun 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dergoegge commented Oct 7, 2022

Uh oh!

Uh oh!

dergoegge commented Apr 5, 2021 •

edited

Loading

DrahtBot commented Apr 5, 2021 •

edited

Loading

practicalswift commented Apr 6, 2021 •

edited

Loading

jonatack commented May 30, 2022 •

edited

Loading

jonatack commented May 31, 2022 •

edited

Loading

jonatack commented May 31, 2022 •

edited

Loading

jonatack commented May 31, 2022 •

edited

Loading

mzumsande commented Jun 22, 2022 •

edited

Loading

jonatack commented Jun 23, 2022 •

edited

Loading

jonatack commented Jun 23, 2022 •

edited

Loading