-
Notifications
You must be signed in to change notification settings - Fork 37.7k
util: Filter control characters out of log messages #17095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e6674da
to
f2f1365
Compare
To test, apply this and run with logging to stdout, without this patch it will make the entire log turn yellow: diff --git a/src/init.cpp b/src/init.cpp
index 035725b0908ebddae882583d489bdd8ad82c4857..26f4c7b906be959aa3ebaa1a2a657028c95b0f49 100644
--- a/src/init.cpp
+++ b/src/init.cpp
@@ -1231,6 +1231,7 @@ bool AppInitMain(InitInterfaces& interfaces)
LogPrintf("Startup time: %s\n", FormatISO8601DateTime(GetTime()));
LogPrintf("Default data directory %s\n", GetDefaultDataDir().string());
LogPrintf("Using data directory %s\n", GetDataDir().string());
+ LogPrintf("\x1b[1;33mIn yellowwwww\n");
// Only log conf file usage message if conf file actually exists.
fs::path config_file_path = GetConfigFile(gArgs.GetArg("-conf", BITCOIN_CONF_FILENAME)); with patch, it will just print:
|
if (ch >= 32 || ch == '\n') { | ||
ret += ch_in; | ||
} else { | ||
ret += strprintf("\\x%02x", ch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems impossible to distinguish between a literal "\x00" and "\x00" after this transformation.
Suggest using SanitizeString
perhaps with a new rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that matters much. \x00
(as in, the code-point 0) is not supposed to end up in a log string at all, if it does that's an error (not serious enough to stop the program, though).
Could add some more apparent warning I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very much against using SanitizeString's methodology here. I don't want to make any characters entirely disappear. If anything this should make them very visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cherry-pick 7cbc72b?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep this simple code specific and self-contained, TBH. Also, escaping with '%%' seems to have exactly the same "issue"? (if you regard it as an issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The escaping code there always escapes '%'
src/logging.cpp
Outdated
*/ | ||
std::string LogEscapeMessage(const std::string& str) { | ||
std::string ret; | ||
for (char ch_in: str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: add a space before :
ACK f2f1365 -- diff looks correct and I've tested it lightly with expected results We can tighten the allowed range ( (The only (theoretical) issue I can think of with this fix is that the size of the log payload is increased by a factor of four when escaping. An attacker trying to fill the disk of a node by repeated log injection can then fill the disk four times as fast by using escape characters in the the injected log payload. I still think this change is worth doing as suggested though: rate limiting logging and/or limiting the length of log messages can be handled in a follow-up PR.) |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
I don't think we should do that without parsing UTF-8. It definitely should be possible to log Russian and Chinese wallet names, or even emoji. Edit : OK I guess \x7f is, strictly, a control code too, and we could easily exclude that.
Yes, true, to be honest I'm not worried about disk filling attacks here. |
} else { | ||
ret += strprintf("\\x%02x", ch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a DCHECK(false)
(#16136) to the else branch, maybe only run once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think crashing the executable on an assertion error is a bad idea here. After all, if the character was injected somehow, that changes a log injection to a live DoS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed it was only enabled by --enable-debug
, which also turns potential lock issues into crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only running the else branch once is fine with me though, not sure. It's a compromise between precise troubleshooting and the fear that this would generate very long strings.
* issues where they accidentally end up in strings. | ||
*/ | ||
std::string LogEscapeMessage(const std::string& str) { | ||
std::string ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: I wasn't sure what the fashion of the day is for calling .reserve
or not, here. The expected (99.9%) outcome here is that the output is exactly the same length as the input.
utACK 8c45e33 |
std::string ret; | ||
for (char ch_in : str) { | ||
uint8_t ch = (uint8_t)ch_in; | ||
if ((ch >= 32 || ch == '\n') && ch != '\x7f') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!iscntrl(ch) || ch == '\n') {
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iscntrl
is locale dependent and we try to avoid using locale dependent functions.
See the following section in the developer notes:
Avoid using locale dependent functions if possible. You can use the provided
lint-locale-dependence.sh
to check for accidental use of locale dependent functions.
Rationale: Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix.
These functions are known to be locale dependent:
alphasort
,asctime
,asprintf
,atof
,atoi
,atol
,atoll
,atoq
,
btowc
,ctime
,dprintf
,fgetwc
,fgetws
,fprintf
,fputwc
,
fputws
,fscanf
,fwprintf
,getdate
,getwc
,getwchar
,isalnum
,
isalpha
,isblank
,iscntrl
,isdigit
,isgraph
,islower
,isprint
,
ispunct
,isspace
,isupper
,iswalnum
,iswalpha
,iswblank
,
iswcntrl
,iswctype
,iswdigit
,iswgraph
,iswlower
,iswprint
,
iswpunct
,iswspace
,iswupper
,iswxdigit
,isxdigit
,mblen
,
mbrlen
,mbrtowc
,mbsinit
,mbsnrtowcs
,mbsrtowcs
,mbstowcs
,
mbtowc
,mktime
,putwc
,putwchar
,scanf
,snprintf
,sprintf
,
sscanf
,stoi
,stol
,stoll
,strcasecmp
,strcasestr
,strcoll
,
strfmon
,strftime
,strncasecmp
,strptime
,strtod
,strtof
,
strtoimax
,strtol
,strtold
,strtoll
,strtoq
,strtoul
,
strtoull
,strtoumax
,strtouq
,strxfrm
,swprintf
,tolower
,
toupper
,towctrans
,towlower
,towupper
,ungetwc
,vasprintf
,
vdprintf
,versionsort
,vfprintf
,vfscanf
,vfwprintf
,vprintf
,
vscanf
,vsnprintf
,vsprintf
,vsscanf
,vswprintf
,vwprintf
,
wcrtomb
,wcscasecmp
,wcscoll
,wcsftime
,wcsncasecmp
,wcsnrtombs
,
wcsrtombs
,wcstod
,wcstof
,wcstoimax
,wcstol
,wcstold
,
wcstoll
,wcstombs
,wcstoul
,wcstoull
,wcstoumax
,wcswidth
,
wcsxfrm
,wctob
,wctomb
,wctrans
,wctype
,wcwidth
,wprintf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I don't want to rely on any C functions here which may or may not have the same view of the character set. The log is file always regarded as UTF-8 anyhow.
Maybe it's possible to add some functional test? (not sure we can generate such log messages without changing source code, so just an idea) |
Well—the idea behind preventing log injection is that it's not possible to inject arbitrary log messages through the network. I don't think we'd want to add any code to encourage it. Could add a unit test for the filter function, though. |
Belts and suspenders: make sure outgoing log messages don't contain potentially suspicious characters, such as terminal control codes. This escapes control characters except newline ('\n') in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings.
Added unit test. Had to rebase to do this (conflict with the spans test), so also squashed the fixups. Should be ready for merge. |
00157e3
to
d7820a1
Compare
ACK d7820a1 - tested and works as expected :) |
d7820a1 util: Filter control characters out of log messages (Wladimir J. van der Laan) Pull request description: Belts and suspenders: make sure outgoing log messages don't contain potentially suspicious characters, such as terminal control codes. This escapes control characters except newline ('\n') in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings (it is a debug log, after all). (more checks could be added such as UTF-8 validity and unicode code-point range checking—this is substantially more involved and would need to keep track of state between characters and even `LogPrint` calls as they could end up split up—but escape codes seem to be the most common attack vector for terminals.) ACKs for top commit: practicalswift: ACK d7820a1 - tested and works as expected :) Tree-SHA512: 0806265addebdcec1062a6def3e903555e62ba5e93967ce9ee6943d16462a222b3f41135a5bff0a76966ae9e7ed75f211d7785bceda788ae0b0654bf3fd891bf
d7820a1 util: Filter control characters out of log messages (Wladimir J. van der Laan) Pull request description: Belts and suspenders: make sure outgoing log messages don't contain potentially suspicious characters, such as terminal control codes. This escapes control characters except newline ('\n') in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings (it is a debug log, after all). (more checks could be added such as UTF-8 validity and unicode code-point range checking—this is substantially more involved and would need to keep track of state between characters and even `LogPrint` calls as they could end up split up—but escape codes seem to be the most common attack vector for terminals.) ACKs for top commit: practicalswift: ACK d7820a1 - tested and works as expected :) Tree-SHA512: 0806265addebdcec1062a6def3e903555e62ba5e93967ce9ee6943d16462a222b3f41135a5bff0a76966ae9e7ed75f211d7785bceda788ae0b0654bf3fd891bf
Belts and suspenders: make sure outgoing log messages don't contain potentially suspicious characters, such as terminal control codes. This escapes control characters except newline ('\n') in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings. Github-Pull: bitcoin#17095 Rebased-From: d7820a1
Belts and suspenders: make sure outgoing log messages don't contain potentially suspicious characters, such as terminal control codes. This escapes control characters except newline ('\n') in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings. Github-Pull: bitcoin#17095 Rebased-From: d7820a1
334e27e util: Filter out macOS process serial number (Hennadii Stepanov) e1bacb5 rpc: fix -rpcclienttimeout 0 option (Fabian Jahr) 6a45766 doc: update bips.md with buried BIP9 deployments (MarcoFalke) dc0fe7a util: Filter control characters out of log messages (Wladimir J. van der Laan) ba46f39 init: Change fallback locale to C.UTF-8 (Wladimir J. van der Laan) Pull request description: Backports the following PRs to the `0.19.0` [branch](https://github.com/bitcoin/bitcoin/tree/0.19): * #17184 - util: Filter out macOS process serial number * #17131 - rpc: fix -rpcclienttimeout 0 option * #17111 - doc: update bips.md with buried BIP9 deployments * #17095 - util: Filter control characters out of log messages * #17085 - init: Change fallback locale to C.UTF-8 ACKs for top commit: laanwj: ACK 334e27e Tree-SHA512: 436064c00f98bae8475d0e46ab104df6fc9bdae4927dcdd5cffa4242704256c749352e9cabb23cf806911b1c303ddcb0208a42d540412e98da2513176e5e1023
…atting c72906d refactor: Remove redundant c_str() calls in formatting (Wladimir J. van der Laan) Pull request description: Our formatter, tinyformat, *never* needs `c_str()` for strings. Still, many places call it redundantly, resulting in longer code and a slight overhead. Remove redundant `c_str()` calls for: - `strprintf` - `LogPrintf` - `tfm::format` (also, combined with bitcoin#17095, I think this improves logging in case of unexpected embedded NULL characters) ACKs for top commit: ryanofsky: Code review ACK c72906d. Easy to review with `git log -p -n1 --word-diff-regex=. -U0 c72906d` Tree-SHA512: 9e21e7bed8aaff59b8b8aa11571396ddc265fb29608c2545b1fcdbbb36d65b37eb361db6688dd36035eab0c110f8de255375cfda50df3d9d7708bc092f67fefc
Belts and suspenders: make sure outgoing log messages don't contain potentially suspicious characters, such as terminal control codes. This escapes control characters except newline ('\n') in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings. Github-Pull: bitcoin#17095 Rebased-From: d7820a1
Belts and suspenders: make sure outgoing log messages don't contain potentially suspicious characters, such as terminal control codes. This escapes control characters except newline ('\n') in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings. Github-Pull: bitcoin#17095 Rebased-From: d7820a1
Belts and suspenders: make sure outgoing log messages don't contain potentially suspicious characters, such as terminal control codes. This escapes control characters except newline ('\n') in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings. Github-Pull: bitcoin#17095 Rebased-From: d7820a1
0b18ea6 util: Filter control characters out of log messages (Wladimir J. van der Laan) ac30fc4 build: Factor out qt translations from build system (Wladimir J. van der Laan) 3b8af5f build: update boost macros to latest upstream (fanquake) b12defc Test that joinpsbts randomly shuffles the inputs (Andrew Chow) eb07d22 Shuffle inputs and outputs after joining psbts (Andrew Chow) 1175410 addrdb: Remove temporary files created in SerializeFileDB. Fixes non-determinism in unit tests. (practicalswift) c52dd12 Handle the result of posix_fallocate system call (Luca Venturini) f792b25 torcontrol: Use the default/standard network port for Tor hidden services, even if the internal port is set differently (Luke Dashjr) 9fe8d28 Bugfix: QA: Run tests with UPnP disabled (Luke Dashjr) 1d12e52 Add vertical spacer (Josu Goñi) d764141 depends: add patch to common dependencies (fanquake) 56815e9 Give QApplication dummy arguments (Andrew Chow) 9d389d0 util: No translation of `Bitcoin Core` in the copyright (MarcoFalke) 87908e9 scripted-diff: Avoid passing PACKAGE_NAME for translation (MarcoFalke) a44e18f build: Stop translating PACKAGE_NAME (MarcoFalke) 7bd8f4e rpc: Fix getblocktemplate CLI example (#16594) (Emil Engler) 1cc06a1 doc: Fix typos in COPYRIGHT (Chuf) Pull request description: Backports some commits to the `0.18` branch: * #16596 - rpc: Fix getblocktemplate CLI example * #16615 - doc: Fix typos in COPYRIGHT * #16291 - gui: Stop translating PACKAGE_NAME (without the `make translate` commit) * #16578 - Do not pass in command line arguments to QApplication * #16051 - depends: add patch to common dependencies * #16090 - Add vertical spacer * #15651 - torcontrol: Use the default/standard network port for Tor hidden services, even if the internal port is set differently * #15650 - Handle the result of posix_fallocate system call * #16646 - Bugfix: QA: Run tests with UPnP disabled * #16212 - addrdb: Remove temporary files created in SerializeFileDB. Fixes non-determinism in unit tests. * #16512 - rpc: Shuffle inputs and outputs after joining psbts * #16870 - build: update boost macros to latest upstream for improved error reporting * #16982 - build: Factor out qt translations from build system * #17095 - util: Filter control characters out of log messages ACKs for top commit: laanwj: ACK 0b18ea6 Tree-SHA512: 37f0e5afc20975f4d1506e8662eda2ae0125f2f424a852818b5af2c3b8db78fc1c365b83571aa80ca63c885ca314302190b891a50ff3851fda9b9238455a5627
Summary: Belts and suspenders: make sure outgoing log messages don't contain potentially suspicious characters, such as terminal control codes. This escapes control characters except newline ('\n') in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings. Backport of Bitcoin Core [[bitcoin/bitcoin#17095 | PR17095]] Test Plan: `ninja && ninja check` Reviewers: O1 Bitcoin ABC, #bitcoin_abc, deadalnix Reviewed By: O1 Bitcoin ABC, #bitcoin_abc, deadalnix Differential Revision: https://reviews.bitcoinabc.org/D7517
334e27e util: Filter out macOS process serial number (Hennadii Stepanov) e1bacb5 rpc: fix -rpcclienttimeout 0 option (Fabian Jahr) 6a45766 doc: update bips.md with buried BIP9 deployments (MarcoFalke) dc0fe7a util: Filter control characters out of log messages (Wladimir J. van der Laan) ba46f39 init: Change fallback locale to C.UTF-8 (Wladimir J. van der Laan) Pull request description: Backports the following PRs to the `0.19.0` [branch](https://github.com/bitcoin/bitcoin/tree/0.19): * bitcoin#17184 - util: Filter out macOS process serial number * bitcoin#17131 - rpc: fix -rpcclienttimeout 0 option * bitcoin#17111 - doc: update bips.md with buried BIP9 deployments * bitcoin#17095 - util: Filter control characters out of log messages * bitcoin#17085 - init: Change fallback locale to C.UTF-8 ACKs for top commit: laanwj: ACK 334e27e Tree-SHA512: 436064c00f98bae8475d0e46ab104df6fc9bdae4927dcdd5cffa4242704256c749352e9cabb23cf806911b1c303ddcb0208a42d540412e98da2513176e5e1023
Belts and suspenders: make sure outgoing log messages don't contain potentially suspicious characters, such as terminal control codes.
This escapes control characters except newline ('\n') in C syntax. It escapes instead of removes them to still allow for troubleshooting issues where they accidentally end up in strings (it is a debug log, after all).
(more checks could be added such as UTF-8 validity and unicode code-point range checking—this is substantially more involved and would need to keep track of state between characters and even
LogPrint
calls as they could end up split up—but escape codes seem to be the most common attack vector for terminals.)