Skip to content

Conversation

ryanofsky
Copy link
Contributor

Replace LogError and LogWarning with LogAlert to make it easier to choose the correct logging levels to use and avoid log spam.

This PR implements an idea ajtowns came up with in #30347 (comment) and it reduces the decision tree for chosing log levels to:

flowchart TD
    start-->|No|dbg
    start{{Is this critical information?}}-->|Yes|crit

    dbg-->|No|logtrace(Trace + Category)
    dbg{{Is this moderate volume?}}-->|Yes|logdebug(Debug + Category)

    crit{{Is there a problem?}}-->|No|loginfo(Info)
    crit-->|Yes|logfatal(Alert)
Loading

@DrahtBot
Copy link
Contributor

DrahtBot commented Jun 30, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK maflcko

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30361 (doc: Drop description of LogError messages as fatal by ryanofsky)
  • #30214 (refactor: Improve assumeutxo state representation by ryanofsky)
  • #30155 (validation: Make ReplayBlocks interruptible by mzumsande)
  • #29656 (chainparams: Change nChainTx type to uint64_t by fjahr)
  • #29307 (util: explicitly close all AutoFiles that have been written by vasild)
  • #28676 ([WIP] Cluster mempool implementation by sdaftuar)
  • #28521 (net, net_processing: additional and consistent disconnect logging by Sjors)
  • #26022 (Add util::ResultPtr class by ryanofsky)
  • #25722 (refactor: Use util::Result class for wallet loading by ryanofsky)
  • #25665 (refactor: Add util::Result failure values, multiple error and warning messages by ryanofsky)
  • #24230 (indexes: Stop using node internal types and locking cs_main, improve sync logic by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jonatack
Copy link
Member

I think this makes the logging less useful, as it's less clear if the alert is a warning, an error, or fatal (or possibly if action by the user is required), and this would affect non-developer users who don't change the logging from default settings.

I would suggest simplifying the logging code, rules/restrictions and logging API, and making the latter consistent along with the printed logs. Reducing end-user utility doesn't seem the way to go in reducing the existing complexity.

@ryanofsky
Copy link
Contributor Author

I think this makes the logging less useful, as it's less clear if the alert is a warning, an error, or fatal

I don't have a strong opinion about this change, but one thing I like about it is that it reduces the potential for log spam. I think developers will be less likely to use a level called "alert" for minor errors and warnings, and will more appropriately use the debug log level instead.

It is true that this PR gets rid of the ability to distinguish warnings and errors from informational messages and fatal errors at an API level. But this might not be a big problem if messages are clearly written and have "Error:" "Warning:" "Internal bug detected:" "Fatal error" and similar prefixes. Also practically speaking I think we are already logging most errors and warnings at the "info" and "debug" levels anyway so maybe this makes the API match current code better.

Again, though I don't have strong feeling about this change. It just seemed like it would be simple to implement and the results look pretty clean, IMO, so I think it's worth looking at and considering.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 6, 2024

🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the
documentation.

Possibly this is due to a silent merge conflict (the changes in this pull request being
incompatible with the current code in the target branch). If so, make sure to rebase on the latest
commit of the target branch.

Leave a comment here, if you need help tracking down a confusing failure.

Debug: https://github.com/bitcoin/bitcoin/runs/26859756520

@ryanofsky
Copy link
Contributor Author

Rebased 179a071 -> d34b529 (pr/alert.1 -> pr/alert.2, compare) due to silent conflict with #29625

@DrahtBot DrahtBot removed the CI failed label Jul 9, 2024
@maflcko
Copy link
Member

maflcko commented Jul 10, 2024

Concept ACK, because errors and warnings should be extremely rare to occur. If anyone is interested in them, and looking for them in the debug log, they will be interested in warnings, if they are looking for errors (and vice-versa). So making a distinction here (and spending effort on it) seems possibly confusing without a clear upside.

It should be clear that an alert is fatal, if the program is shutting down (or aborting) as a result of it. Similarly, it should be clear that an alert is a warning, if the program continues afterwards. Most log messages already state that something is a warning, or that the program will continue, or will abort, and doing so in the future will be useful to users and devs.

Add Warning: prefixes so message intent is not lost when warning level is
removed next commit.
Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased d34b529 -> 4e9ff31 (pr/alert.2 -> pr/alert.3, compare) due to conflict with #30428

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context: The majority of the changed log messages here stem from the error() removal in 31be1a4 which changed ERROR: to [error]. Many of them were not fatal error messages, but "warnings", or stuff that would not abort the program, so changing to [alert], seems clearer either way.

ACK 4e9ff31 🔗

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: ACK 4e9ff3100ae79f3c3046a3358ff16daa7cd89627 🔗
f70Ey8roKfaWqoDaKb+Ropa/pkhackB7NyvBkH2A2MhoJj72WTdgsblLayNSYOfU5ltNUpyN5rpogffZdGyCAQ==

Suggested by mflcko in bitcoin#30364 (comment)

This should change log output in next commit from current:

  [error] Message
  [warning] Message

to:

  [alert] Error: Message
  [alert] Warning: Message

instead of:

  [alert] Message
  [alert] Message
-BEGIN VERIFY SCRIPT-
r() { git grep -l "$1" -- 'src' ':(exclude)src/logging.h' ':(exclude)src/logging.cpp' | xargs sed -i "s/$1/$2/"; }
r LogWarning LogAlert
r LogError LogAlert
r Level::Warning Level::Alert
r Level::Error Level::Alert
sed -i -e 's/error\]/alert]/; s/warning\]/alert]/;' src/test/logging_tests.cpp
-END VERIFY SCRIPT-
@maflcko
Copy link
Member

maflcko commented Jul 16, 2024

ACK 66d35a5 🐛

Show signature

Signature:

untrusted comment: signature from minisign secret key on empty file; verify via: minisign -Vm "${path_to_any_empty_file}" -P RWTRmVTMeKV5noAMqVlsMugDDCyyTSbA3Re5AkUrhvLVln0tSaFWglOw -x "${path_to_this_whole_four_line_signature_blob}"
RUTRmVTMeKV5npGrKx1nqXCw5zeVHdtdYURB/KlyA/LMFgpNCs+SkW9a8N95d+U4AP1RJMi+krxU1A3Yux4bpwZNLvVBKy0wLgM=
trusted comment: ACK 66d35a5c1384bf579e2da8e7054be7536e7a7101 🐛
XCqA+EVm38sxDqIn+MpPqk+acajN0Gpd/LzJPhruNH5MzOAw9jH1tADYxjAQaMB4yxwQf5Xffo/lwC68Bh+bAg==

Copy link
Contributor Author

@ryanofsky ryanofsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 4e9ff31 -> 66d35a5 (pr/alert.3 -> pr/alert.4, compare) with suggested change adding noui.cpp warning and error prefixes

@maflcko do you think you could review #30361 as well? #30361 is a documentation bugfix that I think would make this PR easier to understand, because right now the documentation that this PR is changing is not accurate.

Copy link
Contributor

@ajtowns ajtowns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach makes sense, though the commit that explicitly adds "Warning" and "Error" prefixes does give me some doubts.

Either way, given we're touching a bunch of log lines, I think we should be thinking about whether they actually make sense to be alerts, so most of the following comments are in that vein.

@@ -378,7 +378,7 @@ static CAddress GetBindAddress(const Sock& sock)
if (!sock.GetSockName((struct sockaddr*)&sockaddr_bind, &sockaddr_bind_len)) {
addr_bind.SetSockAddr((const struct sockaddr*)&sockaddr_bind);
} else {
LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "getsockname failed\n");
LogPrintLevel(BCLog::NET, BCLog::Level::Alert, "Warning: getsockname failed\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably be LogDebug rather than Alert messages? They're somewhat remote triggerable (that is, if your system is misconfigured so that an error occurs, the error can occur repeatedly on incoming connections via AcceptConnection), and it also gives very little information on how to fix the problem if it occurs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably be LogDebug rather than Alert messages?

It may be good to check that at least one warning is emitted to the user. Otherwise, it may be harder for them to spot the config error at all? (However, if non-logging related changes are made, I wonder if such changes can be done in separate (preparatory) pull requests, similar to #30064, as they require in-depth low-level net knowledge?)

@@ -1713,7 +1713,7 @@ void CConnman::AcceptConnection(const ListenSocket& hListenSocket) {
}

if (!addr.SetSockAddr((const struct sockaddr*)&sockaddr)) {
LogPrintLevel(BCLog::NET, BCLog::Level::Warning, "Unknown socket family\n");
LogPrintLevel(BCLog::NET, BCLog::Level::Alert, "Warning: Unknown socket family\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also seems like should be LogDebug ?

@@ -583,7 +583,7 @@ void CNode::SetAddrLocal(const CService& addrLocalIn) {
AssertLockNotHeld(m_addr_local_mutex);
LOCK(m_addr_local_mutex);
if (addrLocal.IsValid()) {
LogError("Addr local already set for node: %i. Refusing to change from %s to %s\n", id, addrLocal.ToStringAddrPort(), addrLocalIn.ToStringAddrPort());
LogAlert("Addr local already set for node: %i. Refusing to change from %s to %s\n", id, addrLocal.ToStringAddrPort(), addrLocalIn.ToStringAddrPort());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal error, isn't it? Outside of tests, we only invoke SetAddrLocal() when we receive the first VERSION message, so this is the first time addrLocal is touched, and hence it should have valid=false and never hit this condition. Couldn't this be replaced by if (Assume(!addrLocal.IsValid())) { addrLocal = addrLocalIn; } ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #30617, due to lack of progress here.

@@ -449,7 +449,7 @@ bool Socks5(const std::string& strDest, uint16_t port, const ProxyCredentials* a
return false;
}
if (pchRet2[2] != 0x00) { // Reserved field must be 0
LogError("Error: malformed proxy response\n");
LogAlert("Error: malformed proxy response\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the Log statements in this function could be downgraded to LogDebug(BCLog::PROXY, ...) -- they just indicate a p2p connection failure? The LogPrintf("... connect to %s:%d failed ...") on line 448 generates a bit of noise for nodes with tor enabled, I believe:

2024-07-17T05:23:37.590158Z Socks5() connect to nz7rn2ukf3kwqx24iv6q6tyhiclxwzuk65thkihoo4btbdat4y5ee2qd.onion:8333 failed: host unreachable
2024-07-17T05:23:51.908547Z Socks5() connect to nz7rn2ukf3kwqx24iv6q6tyhiclxwzuk65thkihoo4btbdat4y5ee2qd.onion:8333 failed: host unreachable
2024-07-17T06:18:15.494392Z Socks5() connect to bncti2h36dmqk4l76lgxs3c6d7vczvbkhhvbche4yxgztv766yhsyxad.onion:8333 failed: host unreachable
2024-07-17T07:27:00.134226Z Socks5() connect to kqdw33pqr3ndqmd6lffd2qlvjvkthnximrg4zu35kz2ods7t7zte66yd.onion:8333 failed: host unreachable

Maybe it's worth having an alert that problems are happening because the socks proxy is misbehaving since that's something you could fix; but we tend to be pretty silent otherwise when connections aren't succeeding, so dropping these to debug level seems reasonable to me. I think the only PROXY debug messages we have is prior to the call to ConnectThroughProxy() (which invokes Socks5() which then does a Debug/NET log of slightly less info) and a debug message in Socks5() that reveals the auth/password being used, which I think are only ever set as 0/0, 1/1, 2/2, etc.

return false;
}

pcursor->Next();
} else {
LogError("%s: failed to read value\n", __func__);
LogAlert("%s: failed to read value\n", __func__);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"block index" might be clearer than "value", and could perhaps attempt to decode the block hash from the key and report it? Might be worth pointing out why this is a problem at a higher level, ie "block index database corrupted" or similar?

@@ -131,13 +131,13 @@ bool BlockTreeDB::LoadBlockIndexGuts(const Consensus::Params& consensusParams, s
pindexNew->nTx = diskindex.nTx;

if (!CheckProofOfWork(pindexNew->GetBlockHash(), pindexNew->nBits, consensusParams)) {
LogError("%s: CheckProofOfWork failed: %s\n", __func__, pindexNew->ToString());
LogAlert("%s: CheckProofOfWork failed: %s\n", __func__, pindexNew->ToString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would probably be better to replace __func__ in all these messages with something more relevant to potential users (filename, or some more general indication of what's going on). "block index database corrupted: proof of work check failed for %s\n", pindexNew->ToString() eg?

@@ -158,7 +158,7 @@ bool FillableSigningProvider::GetKey(const CKeyID &address, CKey &keyOut) const
bool FillableSigningProvider::AddCScript(const CScript& redeemScript)
{
if (redeemScript.size() > MAX_SCRIPT_ELEMENT_SIZE) {
LogError("FillableSigningProvider::AddCScript(): redeemScripts > %i bytes are invalid\n", MAX_SCRIPT_ELEMENT_SIZE);
LogAlert("FillableSigningProvider::AddCScript(): redeemScripts > %i bytes are invalid\n", MAX_SCRIPT_ELEMENT_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an if (!Assume(redeemScript.size() <= MAX_SCRIPT_ELEMENT_SIZE)) return false; ?

LogWarning("foo9: %s\n", "bar9");
LogError("foo10: %s\n", "bar10");
LogAlert("foo9: %s\n", "bar9");
LogAlert("foo10: %s\n", "bar10");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much value testing the same thing twice.

@@ -148,7 +148,7 @@ bool Session::Listen(Connection& conn)
conn.sock = StreamAccept();
return true;
} catch (const std::runtime_error& e) {
LogPrintLevel(BCLog::I2P, BCLog::Level::Error, "Couldn't listen: %s\n", e.what());
LogPrintLevel(BCLog::I2P, BCLog::Level::Alert, "Couldn't listen: %s\n", e.what());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Touching all the LogPrintLevel warning/error calls without switching them to LogAlert seems a shame. A separate scripted-diff prior to "Replace log error and warning..." with:

sed -i 's/LogPrintLevel(\(BCLog::[^,]*\), BCLog::Level::Error, */LogAlert(/' $files
sed -i 's/LogPrintLevel(\(BCLog::[^,]*\), BCLog::Level::Warning, */LogAlert(/' $files

should work, I think.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 5, 2024

🐙 This pull request conflicts with the target branch and needs rebase.

glozow added a commit that referenced this pull request Aug 13, 2024
fa6fe43 net: Clarify that m_addr_local is only set once (MarcoFalke)

Pull request description:

  The function is supposed to be only called once when the version msg arrives (a single time). Calling it twice would be an internal logic bug. However, the `LogError` in this function has many issues:

  * If the error happens in tests, as is the case for the buggy fuzz test, it will go unnoticed
  * It is dead code, unless a bug is introduced to execute it

  Fix all issues by using `Assume(!m_addr_local.IsValid())` instead. Idea taken from #30364 (comment)

ACKs for top commit:
  achow101:
    ACK fa6fe43
  mzumsande:
    utACK fa6fe43
  glozow:
    ACK fa6fe43

Tree-SHA512: 8c1e8c524768f4f36cc50110ae54ee423e057a963ff78f736f3bf92df1ce5af28e3e0149153780897944e1d5c22ddbca9dac9865d9f4d44afffa152bc8559405
@ryanofsky
Copy link
Contributor Author

I lost enthusiasm for this PR, so will close it, but tagging as up for grabs, and will happily review if someone else wants to pick it up and respond to the earlier comments, particularly the individual logging improvements suggested #30364 (review).


The thing I like about this PR is that it disentangles log message priority levels from log message conditions.

As I see it, priority levels can distinguish between log messages that:

  • Provide information unlikely to help users, or are high volume (would call Trace)
  • Provide information helpful for debugging user issues and are low volume (would call Debug)
  • Provide information probably helpful in general (would call Notify)
  • Provide information probably requiring some action (would call Critical)

At each of these levels, different conditions are possible:

  • Normal conditions (would call Info)
  • Abnormal conditions that might indicate a problem (would call Warning)
  • Failure conditions that indicate something has gone wrong (would call Error)
  • Impossible conditions indicating an incorrect assumption, or a bug, or hardware failure (would call Bug)

Examples of different priority level / condition pairs:

  • Errors
    • An out of disk space error would probably be a Critical/Error message.
    • A case where saved data is missing or corrupted, but can easily be reconstructed would probably be a Notify/Error message.
    • A transient network error, or an incorrectly called RPC would probably be a Debug/Error message.
  • Warnings
    • A network fork or incorrectly set system time would probably be a Critical/Warning message.
    • A valid, but obsolete or suspicious config setting would probably be a Notify/Warning message.
    • A partially successful RPC operation, or incoming data with unrecognized fields or version numbers would probably be Debug/Warning messages.
  • Information
    • Shutdown or reindexing requests that take the node offline, or wallet operations that require generating new backups would probably be Critical/Info messages.
    • Useful, non-spammy status updates would probably be Notify/Info messages.
    • More detailed or spammy status updates would be Debug/Info or Trace/Info message.
  • Bugs
    • Impossible conditions where we might use Assert or std::logic_error, and indicate something is severely wrong, like CheckBlockIndex failing, would probably be Critical/Bug messages.
    • Impossible conditions where we might use Assume or std::logic_error, but indicate a recoverable problem, like a floating point error calculating position of a progress bar, or a local bug inside of an RPC call that can be easily aborted, would probably be Notify/Bug or Debug/Bug messages.

In my opinion, it is important for the logging framework to be able to distinguish log message priority levels so important information will be seen and logs will not be filled with spam. I think having the 4 priority levels described above would be ideal (Trace Debug Notify Critical), but having the 4 levels defined in this PR would also be ok (Trace Debug Info Alert).

I don't think it's important for the logging framework know about log message conditions, because I think messages can just describe their own conditions clearly, and use Error: Warning: and Bug: prefixes to be explicit.

If people disagree and think 4 log levels is not enough, or that explicitly tracking conditions is important, probably the easiest way to support this would be expand the number of levels like syslog does, or like #30347 tried to do, so basically each priority+condition pair maps straightforwardly to some logging level.

But I don't actually care very much about what logging levels are defined, as long as their usage is documented accurately.

@ryanofsky ryanofsky closed this Aug 16, 2024
@bitcoin bitcoin locked and limited conversation to collaborators Aug 16, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants