p2p: Put disconnecting logs into BCLog::NET category #19832

hebasto · 2020-08-29T14:24:38Z

It's too noisy:

$ cat debug.log | wc -l
28529
$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
10177

maflcko · 2020-08-29T14:28:00Z

noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category, so cr-ACK 1816327

jonatack · 2020-08-29T16:30:09Z

Tend to NACK, the NET log category is much too noisy for this message and I appreciate being able to see it unconditionally logged.

jonatack · 2020-08-29T16:33:13Z

$ cat debug.log | wc -l
28529
$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
10177

These figures are very far from representative? I have 154230 and 116, e.g. less than one line in one thousand, and with no banning, just one seednode and a few addnodes in the config file.

hebasto · 2020-08-29T16:35:38Z

$ cat debug.log | wc -l
28529
$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
10177
These figures are very far from representative? I have 154230 and 116.

Interesting, what factors affect this?

FWIW, that is the node the log was grepped from:
https://bitnodes.io/nodes/leaderboard/?q=95.164.65.194

hebasto · 2020-08-29T16:38:31Z

@jonatack

Tend to NACK, the NET log category is much too noisy for this message and I appreciate being able to see it unconditionally logged.

This PR does not change the logging output amount with the debug=net option.

jonatack · 2020-08-29T16:41:51Z

I think it should stay unconditionally logged. If one-third of a node's debug log is this message, ISTM the node has issues and this should be visibly logged.

hebasto · 2020-08-29T16:44:34Z

I think it should stay unconditionally logged. If one-third of a node's debug log is this message, ISTM the node has issues and this should be visibly logged.

What kind of the node's issues could force it to discourage and disconnect peers?

jonatack · 2020-08-29T17:10:44Z

Misbehaving peers; look at the callers of net_processing.cpp::Misbehaving.

hebasto · 2020-08-29T17:20:57Z

I think it should stay unconditionally logged. If one-third of a node's debug log is this message, ISTM the node has issues and this should be visibly logged.

Misbehaving peers; look at the callers of net_processing.cpp::Misbehaving.

I mean "peers do misbehave", not "the node has issues".

jonatack · 2020-08-29T17:28:54Z

The node having issues because it's seeing a pathological amount of misbehavior. Best to fix the issue, not hide the logs.

practicalswift · 2020-08-30T17:57:34Z

@jonatack What if the peer is doing this deliberately to try to fill our logs with crap? Either to cause disk usage, or to bury warnings about potential real issues (such as "Potential stale tip detected …") by increasing signal to noise.

Generally I don't think misbehaving peers should be able to cause us to write to disk unless the user is explicitly opting in to debug logging.

jonatack · 2020-08-30T18:23:43Z

I'd argue that this is a real issue. One-third of all logging for misbehavior should not be happening, and if the case is truly as extreme as described at the top, then alarms should be going off; better to know about it ASAP and take useful measures (setban, addnode, seednode, etc). The net logging is far too noisy to be useful unless you are specifically searching it for this or have some kind of custom, up-to-date-on-master monitoring system in place. I don't think it's reasonable to change the logging for everyone based on an unusually pathological case.

jonatack · 2020-08-30T18:29:49Z

(If people don't want to see peer misbehavior unconditionally logged, maybe we should add a distinct category of NET logging dedicated to signals of healthy node operation, but ISTM that is what the unconditional log is for.)

practicalswift · 2020-08-30T19:29:08Z

@jonatack

if the case is truly as extreme as described at the top, then alarms should be going off […]

If so then I think it is better is we log when the "alarm should go off" condition is fulfilled: not unconditionally log totally normal events, hope that the user notices that these normal events are "more normal than normally" and that he/she draws the conclusion that something could be wrong.

I don't think that is good usability: evidently also long term Core contributors don't follow the described pattern as proven by this PR :)

Suggestion: we move forward with @hebasto's suggestion and a follow-up PR can make sure the alarm goes off in a clear and unambiguous way when there is indication that something really is broken in a way that may require end-user attention/action :)

jonatack · 2020-08-30T19:45:19Z

This PR is made based on only only one data point, and afaict an extremely unusual one. Is anyone else really seeing this much misbehavior logging?

jonatack · 2020-08-30T20:04:47Z

src/net_processing.cpp

@@ -3828,7 +3828,7 @@ bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode)
    }

    // Normal case: Disconnect the peer and discourage all nodes sharing the address
-    LogPrintf("Disconnecting and discouraging peer %d!\n", peer_id);


I don't think that is good usability: evidently also long term Core contributors don't follow the described pattern as proven by this PR :)

This line was written by @sipa in b691f2d and last touched by @jnewbery in 655b195.

If so then I think it is better is we log when the "alarm should go off" condition is fulfilled: not unconditionally log totally normal events, hope that the user notices that these normal events are "more normal than normally" and that he/she draws the conclusion that something could be wrong.

I don't think that is good usability: evidently also long term Core contributors don't follow the described pattern as proven by this PR :)

This line was written by @sipa in b691f2d and last touched by @jnewbery in 655b195.

My point was simply that Core contributor hebasto who by definition is a power-user didn't interpret the "alarm should go off" logging as such in this case (and TBH I'm not certain he should have). I think it is unreasonable to expect that ordinary users should do that interpretation even if we assume this would be a "alarm should go off" situation.

If this is something we want to warn about we should do it in a more clear and unambiguous way :)

Ok. It's hard to know--I think I'd be more worried about network topology here than a peer causing lots of logging, but I'm not sure if that POV is correct or shared.

I'm thinking of looking at the NET logging to see if we should consider creating a "high level" net logging category (or a "user level" one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I'm not aware of, though.

jnewbery · 2020-08-31T09:20:30Z

Logging here was added way back in b2864d2, shortly after logging categories were added (#3009). That commit also added unconditional logging to Misbehaving(), which has now moved to the NET category. I don't have a very strong opinion, but I think that these logs should probably also go in the NET category.

noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category

@MarcoFalke: I don't understand. Can you explain why they're different?

I'm thinking of looking at the NET logging to see if we should consider creating a "high level" net logging category (or a "user level" one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I'm not aware of, though.

@jonatack: No objections. I'd take it ever further though, and add an (optional) logging severity (DEBUG/INFO/WARNING/ERROR or similar) that can be added to all log messages. The user can then either choose what severity logs they want for each category (eg = -debug=net:warning,tor:debug etc), or have a logging post-processor that can filter by severity/category.

practicalswift · 2020-08-31T09:34:04Z

ACK 1816327 -- I think these logs belong in the NET category :)

jonatack · 2020-08-31T09:44:40Z

I think there is something dramatically wrong with a node that has this message as one-third of its logging. It shouldn't be a motivation for this change.

jonatack · 2020-08-31T09:46:55Z

On my nodes these logs are less than one line in a thousand, and I'm not banning anything.

maflcko · 2020-08-31T13:51:38Z

@MarcoFalke: I don't understand. Can you explain why they're different?

I assume that noban peers are "trusted" peers hand-picked by the node operator, so they should be assumed to not misbehave. If they do and fill your disk, then it is user error and not Bitcoin Core error.

maflcko · 2020-08-31T15:25:53Z

Generally, it should be possible to run Bitcoin Core with well-defined resource usage in an unsupervised setting. Ideally, the resource usage depends on command line settings and user behaviour, not on remote peer (mis)behaviour. And with "unsupervised" I mean that the users doesn't check the debug log unless there is a local software malfunction or other issue that needs manual debugging.

Power users that understand internal Bitcoin Core logic and wish to parse the log for peers to ban, can enable the debug category and use grep to parse the log.

I understand that the net category is heavily dominated by tx-relay logs, but maybe a different pull can split up finer-grained categories from net?

jonatack · 2020-09-02T08:06:42Z

There are 17 other LogPrintf lines in the same file (net_processing.cpp) alone, many of which are also related to peer connection and disconnection. And just as many more in net.cpp.

I'd propose to have a global policy on connection/disconnection logging rather than PRing changes to each of them one-by-one.

If peer connection messages are displayed unconditionally, disconnections arguably should be as well--and many are.

It seems there would be consensus that connections/disconnections be either logged unconditionally, or that it should be possible to separate the logging of these higher-level, lower-frequency actions from the lower-level, high-frequency messages (inv, getdata, tx, etc.)

practicalswift · 2020-09-02T08:19:55Z

@jonatack

There are 17 other LogPrintf lines in the same file (net_processing.cpp) alone, many of which are also related to peer connection and disconnection. And just as many more in net.cpp.

If you know of any scenario where a peer connecting to my node can cause my node to write to the default log (cheaply) then I'd be very interested to know about them so that I can fix them.

Assume a node running bitcoind with default settings (-debug not set).

Generally I don't like the idea of untrusted parties being able to cause unwanted resource usage on my machine :)

jonatack · 2020-09-02T09:13:48Z

@practicalswift I understand your concern. Reconnecting from different IP addresses and re-misbehaving may be trivial, but misbehavior leading to discouragement does not automatically lead to immediate disconnection; see src/banman.h::L36-55. If an attacker is wasting our resources in this way, which #19219 aimed to reduce, a line in the log per disconnection seems the least of our worries. This message ends with an exclamation mark for good reason--it's signal, not noise.

practicalswift · 2020-09-02T13:34:39Z

@jonatack

I think @MarcoFalke summarized the current policy quite nicely:

Generally, it should be possible to run Bitcoin Core with well-defined resource usage in an unsupervised setting. Ideally, the resource usage depends on command line settings and user behaviour, not on remote peer (mis)behaviour. And with "unsupervised" I mean that the users doesn't check the debug log unless there is a local software malfunction or other issue that needs manual debugging.

If the suggestion is to change the policy to something else we should probably open a separate issue for that to get more participation :)

The change in this PR is perfectly in line with the current policy AFAICT.

jonatack · 2020-09-02T13:54:14Z

@practicalswift ISTM better solutions have been proposed in this discussion already, this unconditional logging is no oversight, and this can be closed.

hebasto · 2020-11-26T12:09:38Z

Closing for now, as ill behavior is no longer observed, at least for me :)

ajtowns · 2020-12-04T02:00:27Z

On my nodes these logs are less than one line in a thousand, and I'm not banning anything.

Are you accepting inbound connections or hidden behind tor or something? That seems really low. FWIW, I'm seeing ~50% of log lines being "Disconnecting and discouraging" without NET enabled (11.8k out of 22.4k); though that node is a chunk behind current master.

jonatack · 2020-12-04T22:23:33Z

$ ~/.bitcoin$ cat debug.log | wc -l
9156971
$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
2
$ cat debug.log | grep "discouraging" | wc -l
2
$ grep -i "discourag" ~/.bitcoin/debug.log 
2020-10-08T19:45:15Z Disconnecting and discouraging peer 604!
2020-10-09T17:23:27Z Disconnecting and discouraging peer 288!

Yeah, down to less than one in a million now, but it's a pointless stat, since a couple months been running a tor v3 hidden service with almost no inbounds. 50% seems like a lot though.

I'm thinking of looking at the NET logging to see if we should consider creating a "high level" net logging category (or a "user level" one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I'm not aware of, though.

@jonatack: No objections. I'd take it ever further though, and add an (optional) logging severity (DEBUG/INFO/WARNING/ERROR or similar) that can be added to all log messages. The user can then either choose what severity logs they want for each category (eg = -debug=net:warning,tor:debug etc), or have a logging post-processor that can filter by severity/category.

I'd still like this, may propose something soon ™️

ajtowns · 2020-12-05T01:35:50Z

Yeah; it is a lot -- but it's triggerable by any badly written fake node software that does something stupid and also reconnects to your node sometime after being disconnected. I restarted that node after updating to current master and am back up to 378 lines of debug.log with 161 of them being Disconnecting and discouraging. (no debug logging enabled, which also means I can't tell what IPs those peer ids are related to)

practicalswift · 2020-12-05T10:03:28Z

Judging from the discussions in this PR is seems to me that the consensus opinion is that "Disconnecting and discouraging peer" is too noisy. Consider re-opening this PR :)

jonatack · 2020-12-05T10:18:22Z

Judging from the discussions in this PR is seems to me that the consensus opinion is that "Disconnecting and discouraging peer" is too noisy. Consider re-opening this PR :)

I suspect that re-proposing to hide the symptoms for all users every time one person has a situation with their node might not be the optimal way to address this. "Let's sweep it under the rug."

That said, since this discussion will go in circles until there is a real solution, I'll unsubscribe here and concentrate at some point on proposing a solution for node operators who would like to know about these situations without running and grepping the current NET category logging which is far, far more noisy.

hebasto · 2020-12-05T10:18:24Z

Judging from the discussions in this PR is seems to me that the consensus opinion is that "Disconnecting and discouraging peer" is too noisy. Consider re-opening this PR :)

Re-opened.

practicalswift · 2020-12-05T15:52:06Z

ACK 1816327 for the reasons MarcoFalke gave above.

It shouldn't be easy for an attacker to add noise to our logs. -debug can be used in troubleshooting scenarios.

ajtowns · 2020-12-07T07:39:34Z

Having left the restarted node running for a couple of days now, I've broken my logs up by how frequently each message appears.

Standard "everything's going fine" logs:

91 x Bitcoin Core version v0.20.99.0-e21b82438 (release build)  [introductory logs]
387 x UpdateTip: new best=....
49 x New outbound peer connected: ...

Blockchain warnings about people sending me BCH headers and blocks sometimes being slow:

93 x ERROR: AcceptBlockHeader: Consensus::ContextualCheckBlockHeader: ..., bad-diffbits, incorrect proof of work
13 x Potential stale tip detected, will try using extra outbound peer (last tip update: ... seconds ago)

Info about disk allocations/pruning:

100 x Pre-allocating up to position ... in ...
4 x Prune: UnlinkPrunedFiles deleted blk/rev (...)
8 x FlushStateToDisk: write coins cache to disk ... started / completed (...)
4 x Leaving block file ...: CBlockFileInfo(...)

Connection info, pretty much none of which is actionable with the information given:

1114 x  Disconnecting and discouraging peer ...! 
19 x ping timeout: 1200.[...]
16 x socket receive timeout: 1201s
11 x socket send error Broken pipe (32)
5 x Socks5() connect to ....onion:8333 failed: connection refused
5 x socket send error Connection reset by peer (104)
3 x Socks5() connect to ....onion:8333 failed: host unreachable
3 x socket send error Connection timed out (110)

ACK 1816327

maflcko · 2020-12-07T08:07:47Z

I've edited one review (#19832 (comment)) to remove an @ mention, which would be included in the merge commit

…gory 1816327 p2p: Put disconnecting logs into BCLog::NET category (Hennadii Stepanov) Pull request description: It's too noisy: ``` $ cat debug.log | wc -l 28529 $ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l 10177 ``` ACKs for top commit: MarcoFalke: noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category, so cr-ACK 1816327 practicalswift: ACK 1816327 for the reasons MarcoFalke gave above. ajtowns: ACK 1816327 Tree-SHA512: c312c1009090840659b2cb1364d8ad9b6ab8e742fc462aef169996d93c76c248507639a00257ed9d73a6916c01176b1793491b2305e92fdded5f9de0935b6ba6

p2p: Put disconnecting logs into BCLog::NET category

1816327

DrahtBot added the P2P label Aug 29, 2020

jonatack reviewed Aug 30, 2020

View reviewed changes

practicalswift mentioned this pull request Oct 19, 2020

Make non-critical unconditional logging conditional to reduce risk of disk fill attacks and to improve user experience #20174

Closed

hebasto closed this Nov 26, 2020

hebasto reopened this Dec 5, 2020

maflcko merged commit 1a04f45 into bitcoin:master Dec 7, 2020

hebasto deleted the 200829-log branch December 7, 2020 08:33

ajtowns mentioned this pull request Dec 19, 2020

Cleanup of -debug=net log messages #20724

Merged

maflcko mentioned this pull request Jan 4, 2021

net: Log to net debug in MaybeDiscourageAndDisconnect except for noban and manual peers #20845

Merged

bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022

p2p: Put disconnecting logs into BCLog::NET category #19832

p2p: Put disconnecting logs into BCLog::NET category #19832

Uh oh!

Conversation

hebasto commented Aug 29, 2020

Uh oh!

maflcko commented Aug 29, 2020

Uh oh!

jonatack commented Aug 29, 2020

Uh oh!

jonatack commented Aug 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hebasto commented Aug 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hebasto commented Aug 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatack commented Aug 29, 2020

Uh oh!

hebasto commented Aug 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatack commented Aug 29, 2020

Uh oh!

hebasto commented Aug 29, 2020

Uh oh!

jonatack commented Aug 29, 2020

Uh oh!

practicalswift commented Aug 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatack commented Aug 30, 2020

Uh oh!

jonatack commented Aug 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

practicalswift commented Aug 30, 2020

Uh oh!

jonatack commented Aug 30, 2020

Uh oh!

jonatack Aug 30, 2020

Choose a reason for hiding this comment

Uh oh!

practicalswift Aug 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonatack Aug 31, 2020

Choose a reason for hiding this comment

Uh oh!

jnewbery commented Aug 31, 2020

Uh oh!

practicalswift commented Aug 31, 2020

Uh oh!

jonatack commented Aug 31, 2020

Uh oh!

jonatack commented Aug 31, 2020

Uh oh!

maflcko commented Aug 31, 2020

Uh oh!

maflcko commented Aug 31, 2020

Uh oh!

jonatack commented Sep 2, 2020

Uh oh!

practicalswift commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatack commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

practicalswift commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonatack commented Sep 2, 2020

Uh oh!

hebasto commented Nov 26, 2020

Uh oh!

ajtowns commented Dec 4, 2020

jonatack commented Aug 29, 2020 •

edited

Loading

hebasto commented Aug 29, 2020 •

edited

Loading

hebasto commented Aug 29, 2020 •

edited

Loading

hebasto commented Aug 29, 2020 •

edited

Loading

practicalswift commented Aug 30, 2020 •

edited

Loading

jonatack commented Aug 30, 2020 •

edited

Loading

practicalswift Aug 31, 2020 •

edited

Loading

practicalswift commented Sep 2, 2020 •

edited

Loading

jonatack commented Sep 2, 2020 •

edited

Loading

practicalswift commented Sep 2, 2020 •

edited

Loading

jonatack commented Dec 4, 2020 •

edited

Loading

practicalswift commented Dec 5, 2020 •

edited by maflcko

Loading