log: Mitigate disk filling attacks by temporarily rate limiting LogPrintf(…) #19995

practicalswift · 2020-09-22T17:47:22Z

Mitigate disk filling attacks by temporarily rate limiting LogPrintf(…).

A disk fill attack is an attack where an untrusted party (such as a peer) is able to cheaply make your node log to disk excessively. The excessive logging may fill your disk and thus make your node crash either cleanly (best case: if disk fill rate is relatively slow) or uncleanly (worst case: if disk fill rate is relatively fast).

The hourly logging quota is set per source location. Every single source location (say net_processing.cpp:418) gets a quota of 1 MB of logging per hour.

Notes:

Only logging to disk is rate limited. Logging to console is not rate limited.
Only LogPrintf(…) is rate limited. LogPrint(category, …) (-debug) is not rate limited.
UpdateTip: new best=[…] is logged using LogPrintfWithoutRateLimiting(…) to avoid rate limiting. High log volume is expected for that source location during IBD.

Live demo:

$ git diff
diff --git a/src/init.cpp b/src/init.cpp
index 023aa9aba..56a250876 100644
--- a/src/init.cpp
+++ b/src/init.cpp
@@ -1232,6 +1232,13 @@ bool AppInitInterfaces(NodeContext& node)
     return true;
 }
 
+void SimulateDiskFillingAttack() {
+    uint64_t n = 0;
+    while (true) {
+        LogPrintf("Chancellor on brink of %d:th bailout for banks. LogPrintf on brink of first disk fill for node.\n", ++n);
+    }
+}
+
 bool AppInitMain(const util::Ref& context, NodeContext& node, interfaces::BlockAndHeaderTipInfo* tip_info)
 {
     const ArgsManager& args = *Assert(node.args);
@@ -1845,6 +1852,8 @@ bool AppInitMain(const util::Ref& context, NodeContext& node, interfaces::BlockA
     }
 #endif
 
+    std::thread{SimulateDiskFillingAttack}.detach();
+
     std::vector<fs::path> vImportFiles;
     for (const std::string& strFile : args.GetArgs("-loadblock")) {
         vImportFiles.push_back(strFile);
$ make -C src/ bitcoind
$ src/bitcoind
$ tail debug.log
2020-09-22T16:11:36Z Chancellor on brink of 8813:th bailout for banks. LogPrintf on brink of first disk fill for node.
2020-09-22T16:11:36Z Chancellor on brink of 8814:th bailout for banks. LogPrintf on brink of first disk fill for node.
2020-09-22T16:11:36Z Chancellor on brink of 8815:th bailout for banks. LogPrintf on brink of first disk fill for node.
2020-09-22T16:11:36Z Chancellor on brink of 8816:th bailout for banks. LogPrintf on brink of first disk fill for node.
2020-09-22T16:11:36Z Chancellor on brink of 8817:th bailout for banks. LogPrintf on brink of first disk fill for node.
2020-09-22T16:11:36Z Chancellor on brink of 8818:th bailout for banks. LogPrintf on brink of first disk fill for node.
2020-09-22T16:11:36Z Chancellor on brink of 8819:th bailout for banks. LogPrintf on brink of first disk fill for node.
2020-09-22T16:11:36Z Chancellor on brink of 8820:th bailout for banks. LogPrintf on brink of first disk fill for node.
2020-09-22T16:11:36Z Excessive logging detected from init.cpp:1238 (SimulateDiskFillingAttack): >1 MiB logged during the last hour. Suppressing logging to disk from this source location for up to one hour. Console logging unaffected. Last log entry: 2020-09-22T16:11:36Z Chancellor on brink of 8821:th bailout for banks. LogPrintf on brink of first disk fill for node.
2020-09-22T16:12:37Z Adding fixed seed nodes as DNS doesn't seem to be available.
$ ls -hl debug.log
–rw------- 1     1.1M Sep 22 16:12 debug.log

DrahtBot · 2020-09-22T18:55:49Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

log: Prefix log messages with function name and source code location if -logsourcelocations is set #19809 (log: Prefix log messages with function name if -logfunctionnames is set by practicalswift)
Relog configuration args on debug.log rotation #16673 (Relog configuration args on debug.log rotation by LarryRuane)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

naumenkogs · 2020-09-23T07:25:55Z

Concept ACK. I was wondering about this threat several times over last year, but never managed to look into it for real.

laanwj · 2020-09-23T12:04:34Z

I'm not sure about this. This kind of "drop everything" rate limiting can be used for a different kind of DoS that might be even more serious than just shutting down: prevent important things from being logged.
Edit: I misunderstood the code here, it tracks per source location. Concept ACK.

practicalswift · 2020-09-23T13:14:25Z

@laanwj

Have you looked at the implementation? It certainly doesn't drop everything :)

The implementation was carefully written to minimise the risk of suppressing something important from being logged while at the same time killing the disk-fill-via-peer-triggered-logging bug class for good.

Do you have any ideas on how to improve the implementation in a way to reduce the "prevent important things from being logged" risk?

Perhaps this implementation isn't the way to go but I really think we need some kind of general mitigation for disk fill attacks. This is a non-theoretical DoS vector which has been problematic for Bitcoin Core historically.

naumenkogs · 2020-09-23T13:56:04Z

I don't know how hard of a task that would be, but it would be great to see how easy it is to fill disk space depending on the protocol piece (message but also context).
For example,
tx INV: X bytes per second
ADDR: Y bytes per second
GETADDR: Z bytes per second

From there we could make sure our rate-limiting is effective?

practicalswift · 2020-09-23T14:21:47Z

@naumenkogs

When testing this patch I reverted the fix for an historic disk fill bug to see if this mitigation would have been successful.

In that case an untrusted peer could achieve a disk fill rate of 15 GB/hour without this mitigation, and the expected 1 MB/hour with this mitigation. (The 15 GB/hour was achieved on a really old machine with mechanical disks so YMMV.)

In other words: the mitigation slowed that specific attack by a factor of 15 000 :)

Note that the quota is set per source location. So every single source location (say net_processing.cpp:418) gets a quota of 1 MB of logging per hour. I think we could bump that to 10 MB/hour or even 100 MB/hour and still largely kill this bug class.

Perhaps I misunderstood what you meant by the INV/ADDR/GETADDR bytes per second thing, but we only log per such message via LogPrint(CATEGORY, …) (-debug). LogPrint(CATEGORY, …) is intentionally not covered by the rate limiting, since end-users opt-ing in to -debug are assumed to know that doing so may cause excessive disk usage :)

The goal of this PR is to kill this bug class for a bitcoind node running with default settings (no -debug).

laanwj · 2020-09-24T13:47:05Z

src/logging.cpp

+            if (print_quota_state_change_message) {
+                str_prefixed = LogTimestampStr(strprintf("Excessive logging detected from %s (%s): >%d MiB logged during the last hour. Suppressing logging to disk from this source location for up to one hour. Console logging unaffected. Last log entry: %s", source_location, logging_function, HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION / (1024 * 1024), str_prefixed));
+            } else {
+                skip_writing_to_disk_due_to_rate_limiting = true;


Oh, sorry, I thought this was a global flag.

src/logging.h

practicalswift · 2020-09-24T20:44:54Z

@laanwj Thanks for the Concept ACK! :)

Feedback addressed.

laanwj · 2020-09-30T12:53:39Z

src/logging.h

+// MUST NOT be rate limited no matter how often they are logged. That requirement
+// should be extremely rare, so please use with care. Prefer LogPrintf(...) if
+// possible.
+#define LogPrintfWithoutRateLimiting(...) LogPrintf_(__func__, __FILE__, __LINE__, /* skip_disk_usage_rate_limiting */ true, __VA_ARGS__)


This name is really awkward. I would prefer a more compact name, but don't really have any good suggestions.

Heh, I agree :) Naming is hard! I'm happy to change the name of course.

I expect LogPrintfWithoutRateLimiting to be used once: for UpdateTip which is the only non-category log message which we ever expect to legitimately log at a rate of >1 MB/hour (or whatever hourly source-location threshold we choose).

One alternative would be to call LogPrintf_ directly from UpdateTip as it is a one-off. That would solve the naming problem too :)

laanwj · 2020-09-30T13:30:13Z

$ size -G src/bitcoind*
      text       data        bss      total filename
   6454125    1929533      38929    8422587 src/bitcoind
   6564925    1942029      38929    8545883 src/bitcoind.19995

FWIW, this results in a 120 kB increase in stripped binary size for bitcoind on x86_64. In contrast to what I first thought, this is not because of the __FILE__. There's only 12 kB more in strings. The rest is extra instructions in .text to pass the line number to every logging call.

(still concept ACK though, and thanks for addressing my comments, code changes LGTM now)

jnewbery

Concept ACK.

Log shedding needs to be done very carefully, since logs are usually the best data we have when diagnosing user issues. I'm not convinced that shedding logs per location is a good idea. Partial logs are often worse than no logs at all and can be very misleading ("it logged this and then it didn't log this, so it can't have taken this code branch"). Even with a special "Suppressing logging to disk from this source location" log, it's very easy to be misled unless you know exactly what you're looking for.

I think I'd prefer log-shedding to be done globally, with a special log at the end of a log suppression period that tallies up which log locations have been suppressed.

jnewbery · 2020-10-09T10:11:48Z

src/logging.cpp

+        const std::chrono::seconds now = GetTime<std::chrono::seconds>();
+        if ((now - m_last_reset_of_bytes_logged_per_source_location) > std::chrono::hours{1}) {
+            m_bytes_logged_per_source_location.clear();
+            m_last_reset_of_bytes_logged_per_source_location = now;


It'd be good to check here whether logging was suppressed for this log, and if so log that logging is restarting for this log (eg. "Restarting logging from %s ...")

That's a good idea. I'll implement!

jnewbery · 2020-10-09T10:14:32Z

src/logging.cpp

+            if (print_quota_state_change_message) {
+                str_prefixed = LogTimestampStr(strprintf("Excessive logging detected from %s:%d (%s): >%d MiB logged during the last hour. Suppressing logging to disk from this source location for up to one hour. Console logging unaffected. Last log entry: %s", source_location.first, source_location.second, logging_function, HOURLY_LOG_QUOTA_IN_BYTES_PER_SOURCE_LOCATION / (1024 * 1024), str_prefixed));
+            } else {
+                skip_writing_to_disk_due_to_rate_limiting = true;


Consider keeping a tally of how many log messages were dropped, and then logging that number when you restart logging (eg "Restarting logging from <location>. <number> log messages dropped since <time>")

Makes sense. I'll implement.

maflcko · 2020-10-13T11:48:19Z

Conceptually, I am not sure if it is good to fight the symptoms. I'd prefer if unconditional logs were only used for the init/shutdown sequence and local system errors, such as data corruption. Anything else the average user probably doesn't care about, and if they did, they could enable the corresponding debug category and provide enough disk space for the debug log file.

practicalswift · 2020-10-13T13:06:47Z

@MarcoFalke

Conceptually, I am not sure if it is good to fight the symptoms.

I fail to see how introducing a mitigation would be to "fight the symptoms".

The reason we use ASLR for example isn't that we've given up on the ambition to write non-vulnerable code :)

The reason we use mitigations is to make exploitation harder in case we fail to write non-vulnerable free code, and thus hopefully turn a vulnerability into a normal bug.

Have you looked at the implementation in this PR? :)

The mitigation logic would only kick in in under extreme circumstances meaning that the only observable difference from a user perspective is that his/her node would be exploitable to disk fill attacks without the mitigation in place, whereas he/she would be safe with it.

Note that this is not a theoretical issue: we've had exploitable disk filling vulnerabilities in the past which this mitigation would have successfully turned into just another non-exploitable non-security bug.

I'd prefer if unconditional logs were only used for the init/shutdown sequence and local system errors, such as data corruption. Anything else the average user probably doesn't care about, and if they did, they could enable the corresponding debug category and provide enough disk space for the debug log file.

I agree that we should log more conservatively and ideally leave no room for disk filling attacks, but that doesn't preclude also protecting our users in case we fail to do so :)

luke-jr · 2020-10-24T17:23:01Z

Can we get a way to disable this via config file?

practicalswift · 2020-12-27T22:18:34Z

Could a maintainer mark this PR as "Up for grabs"? I'm afraid I won't have time to incorporate the suggested changes any time soon.

FWIW I still think this mitigation is very much worth doing: the "up for grabs" state is purely due to lack of time on my part. I'd be glad to review any work being done to reduce the risk of disk fill attacks.

fanquake · 2021-04-07T06:32:35Z

Picked up in #21603.

DrahtBot added Utils/log/libs Validation labels Sep 22, 2020

DrahtBot mentioned this pull request Sep 22, 2020

log: Prefix log messages with function name and source code location if -logsourcelocations is set #19809

Merged

fanquake removed the Validation label Sep 22, 2020

DrahtBot mentioned this pull request Sep 23, 2020

Relog configuration args on debug.log rotation #16673

Closed

practicalswift changed the title ~~log: Mitigate disk filling attacks by rate limiting LogPrintf(…)~~ log: Mitigate disk filling attacks by temporarily rate limiting LogPrintf(…) Sep 23, 2020

laanwj reviewed Sep 24, 2020

View reviewed changes

src/logging.h Outdated Show resolved Hide resolved

laanwj reviewed Sep 24, 2020

View reviewed changes

src/logging.h Outdated Show resolved Hide resolved

practicalswift force-pushed the mitigate-log-disk-filling-attacks branch 3 times, most recently from 9dfad47 to 9177339 Compare September 24, 2020 20:02

practicalswift force-pushed the mitigate-log-disk-filling-attacks branch 3 times, most recently from 9603200 to 0b402ae Compare September 25, 2020 09:26

log: Mitigate disk filling attacks by rate limiting LogPrintf

a86ab57

practicalswift force-pushed the mitigate-log-disk-filling-attacks branch from 0b402ae to a86ab57 Compare September 25, 2020 14:24

laanwj reviewed Sep 30, 2020

View reviewed changes

jnewbery reviewed Oct 9, 2020

View reviewed changes

practicalswift mentioned this pull request Oct 17, 2020

Make non-critical unconditional logging conditional to reduce risk of disk fill attacks and to improve user experience #20174

Closed

amitiuttarwar added the Up for grabs label Dec 28, 2020

fanquake closed this Dec 28, 2020

practicalswift mentioned this pull request Mar 31, 2021

Add mitigation for disk fill via logging attacks #21559

Closed

dergoegge mentioned this pull request Apr 5, 2021

log: Mitigate disk filling attacks by rate limiting LogPrintf #21603

Closed

fanquake removed the Up for grabs label Apr 7, 2021

practicalswift deleted the mitigate-log-disk-filling-attacks branch April 10, 2021 19:42

bitcoin locked as resolved and limited conversation to collaborators Aug 18, 2022

log: Mitigate disk filling attacks by temporarily rate limiting LogPrintf(…) #19995

log: Mitigate disk filling attacks by temporarily rate limiting LogPrintf(…) #19995

Uh oh!

Conversation

practicalswift commented Sep 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Sep 22, 2020

Conflicts

Uh oh!

naumenkogs commented Sep 23, 2020

Uh oh!

laanwj commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

practicalswift commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naumenkogs commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

practicalswift commented Sep 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

laanwj Sep 24, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

practicalswift commented Sep 24, 2020

Uh oh!

laanwj Sep 30, 2020

Choose a reason for hiding this comment

Uh oh!

practicalswift Sep 30, 2020

Choose a reason for hiding this comment

Uh oh!

laanwj commented Sep 30, 2020

Uh oh!

jnewbery left a comment

Choose a reason for hiding this comment

Uh oh!

jnewbery Oct 9, 2020

Choose a reason for hiding this comment

Uh oh!

practicalswift Oct 13, 2020

Choose a reason for hiding this comment

Uh oh!

jnewbery Oct 9, 2020

Choose a reason for hiding this comment

Uh oh!

practicalswift Oct 13, 2020

Choose a reason for hiding this comment

Uh oh!

maflcko commented Oct 13, 2020

Uh oh!

practicalswift commented Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

luke-jr commented Oct 24, 2020

Uh oh!

practicalswift commented Dec 27, 2020

Uh oh!

fanquake commented Apr 7, 2021

Uh oh!

Uh oh!

practicalswift commented Sep 22, 2020 •

edited

Loading

laanwj commented Sep 23, 2020 •

edited

Loading

practicalswift commented Sep 23, 2020 •

edited

Loading

naumenkogs commented Sep 23, 2020 •

edited

Loading

practicalswift commented Sep 23, 2020 •

edited

Loading

practicalswift commented Oct 13, 2020 •

edited

Loading