Skip to content

Conversation

hebasto
Copy link
Member

@hebasto hebasto commented Aug 29, 2020

It's too noisy:

$ cat debug.log | wc -l
28529
$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
10177

@maflcko
Copy link
Member

maflcko commented Aug 29, 2020

noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category, so cr-ACK 1816327

@DrahtBot DrahtBot added the P2P label Aug 29, 2020
@jonatack
Copy link
Member

Tend to NACK, the NET log category is much too noisy for this message and I appreciate being able to see it unconditionally logged.

@jonatack
Copy link
Member

jonatack commented Aug 29, 2020

$ cat debug.log | wc -l
28529
$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
10177

These figures are very far from representative? I have 154230 and 116, e.g. less than one line in one thousand, and with no banning, just one seednode and a few addnodes in the config file.

@hebasto
Copy link
Member Author

hebasto commented Aug 29, 2020

$ cat debug.log | wc -l
28529
$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
10177

These figures are very far from representative? I have 154230 and 116.

Interesting, what factors affect this?

FWIW, that is the node the log was grepped from:
https://bitnodes.io/nodes/leaderboard/?q=95.164.65.194

@hebasto
Copy link
Member Author

hebasto commented Aug 29, 2020

@jonatack

Tend to NACK, the NET log category is much too noisy for this message and I appreciate being able to see it unconditionally logged.

This PR does not change the logging output amount with the debug=net option.

@jonatack
Copy link
Member

I think it should stay unconditionally logged. If one-third of a node's debug log is this message, ISTM the node has issues and this should be visibly logged.

@hebasto
Copy link
Member Author

hebasto commented Aug 29, 2020

I think it should stay unconditionally logged. If one-third of a node's debug log is this message, ISTM the node has issues and this should be visibly logged.

What kind of the node's issues could force it to discourage and disconnect peers?

@jonatack
Copy link
Member

Misbehaving peers; look at the callers of net_processing.cpp::Misbehaving.

@hebasto
Copy link
Member Author

hebasto commented Aug 29, 2020

I think it should stay unconditionally logged. If one-third of a node's debug log is this message, ISTM the node has issues and this should be visibly logged.

Misbehaving peers; look at the callers of net_processing.cpp::Misbehaving.

I mean "peers do misbehave", not "the node has issues".

@jonatack
Copy link
Member

The node having issues because it's seeing a pathological amount of misbehavior. Best to fix the issue, not hide the logs.

@practicalswift
Copy link
Contributor

practicalswift commented Aug 30, 2020

@jonatack What if the peer is doing this deliberately to try to fill our logs with crap? Either to cause disk usage, or to bury warnings about potential real issues (such as "Potential stale tip detected …") by increasing signal to noise.

Generally I don't think misbehaving peers should be able to cause us to write to disk unless the user is explicitly opting in to debug logging.

@jonatack
Copy link
Member

I'd argue that this is a real issue. One-third of all logging for misbehavior should not be happening, and if the case is truly as extreme as described at the top, then alarms should be going off; better to know about it ASAP and take useful measures (setban, addnode, seednode, etc). The net logging is far too noisy to be useful unless you are specifically searching it for this or have some kind of custom, up-to-date-on-master monitoring system in place. I don't think it's reasonable to change the logging for everyone based on an unusually pathological case.

@jonatack
Copy link
Member

jonatack commented Aug 30, 2020

(If people don't want to see peer misbehavior unconditionally logged, maybe we should add a distinct category of NET logging dedicated to signals of healthy node operation, but ISTM that is what the unconditional log is for.)

@practicalswift
Copy link
Contributor

@jonatack

if the case is truly as extreme as described at the top, then alarms should be going off […]

If so then I think it is better is we log when the "alarm should go off" condition is fulfilled: not unconditionally log totally normal events, hope that the user notices that these normal events are "more normal than normally" and that he/she draws the conclusion that something could be wrong.

I don't think that is good usability: evidently also long term Core contributors don't follow the described pattern as proven by this PR :)

Suggestion: we move forward with @hebasto's suggestion and a follow-up PR can make sure the alarm goes off in a clear and unambiguous way when there is indication that something really is broken in a way that may require end-user attention/action :)

@jonatack
Copy link
Member

This PR is made based on only only one data point, and afaict an extremely unusual one. Is anyone else really seeing this much misbehavior logging?

@@ -3828,7 +3828,7 @@ bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode)
}

// Normal case: Disconnect the peer and discourage all nodes sharing the address
LogPrintf("Disconnecting and discouraging peer %d!\n", peer_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is good usability: evidently also long term Core contributors don't follow the described pattern as proven by this PR :)

This line was written by @sipa in b691f2d and last touched by @jnewbery in 655b195.

Copy link
Contributor

@practicalswift practicalswift Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so then I think it is better is we log when the "alarm should go off" condition is fulfilled: not unconditionally log totally normal events, hope that the user notices that these normal events are "more normal than normally" and that he/she draws the conclusion that something could be wrong.

I don't think that is good usability: evidently also long term Core contributors don't follow the described pattern as proven by this PR :)

This line was written by @sipa in b691f2d and last touched by @jnewbery in 655b195.

My point was simply that Core contributor hebasto who by definition is a power-user didn't interpret the "alarm should go off" logging as such in this case (and TBH I'm not certain he should have). I think it is unreasonable to expect that ordinary users should do that interpretation even if we assume this would be a "alarm should go off" situation.

If this is something we want to warn about we should do it in a more clear and unambiguous way :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. It's hard to know--I think I'd be more worried about network topology here than a peer causing lots of logging, but I'm not sure if that POV is correct or shared.

I'm thinking of looking at the NET logging to see if we should consider creating a "high level" net logging category (or a "user level" one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I'm not aware of, though.

@jnewbery
Copy link
Contributor

Logging here was added way back in b2864d2, shortly after logging categories were added (#3009). That commit also added unconditional logging to Misbehaving(), which has now moved to the NET category. I don't have a very strong opinion, but I think that these logs should probably also go in the NET category.

noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category

@MarcoFalke: I don't understand. Can you explain why they're different?

I'm thinking of looking at the NET logging to see if we should consider creating a "high level" net logging category (or a "user level" one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I'm not aware of, though.

@jonatack: No objections. I'd take it ever further though, and add an (optional) logging severity (DEBUG/INFO/WARNING/ERROR or similar) that can be added to all log messages. The user can then either choose what severity logs they want for each category (eg = -debug=net:warning,tor:debug etc), or have a logging post-processor that can filter by severity/category.

@practicalswift
Copy link
Contributor

ACK 1816327 -- I think these logs belong in the NET category :)

@jonatack
Copy link
Member

I think there is something dramatically wrong with a node that has this message as one-third of its logging. It shouldn't be a motivation for this change.

@jonatack
Copy link
Member

On my nodes these logs are less than one line in a thousand, and I'm not banning anything.

@maflcko
Copy link
Member

maflcko commented Aug 31, 2020

@MarcoFalke: I don't understand. Can you explain why they're different?

I assume that noban peers are "trusted" peers hand-picked by the node operator, so they should be assumed to not misbehave. If they do and fill your disk, then it is user error and not Bitcoin Core error.

@maflcko
Copy link
Member

maflcko commented Aug 31, 2020

Generally, it should be possible to run Bitcoin Core with well-defined resource usage in an unsupervised setting. Ideally, the resource usage depends on command line settings and user behaviour, not on remote peer (mis)behaviour. And with "unsupervised" I mean that the users doesn't check the debug log unless there is a local software malfunction or other issue that needs manual debugging.

Power users that understand internal Bitcoin Core logic and wish to parse the log for peers to ban, can enable the debug category and use grep to parse the log.

I understand that the net category is heavily dominated by tx-relay logs, but maybe a different pull can split up finer-grained categories from net?

@jonatack
Copy link
Member

jonatack commented Sep 2, 2020

There are 17 other LogPrintf lines in the same file (net_processing.cpp) alone, many of which are also related to peer connection and disconnection. And just as many more in net.cpp.

I'd propose to have a global policy on connection/disconnection logging rather than PRing changes to each of them one-by-one.

If peer connection messages are displayed unconditionally, disconnections arguably should be as well--and many are.

It seems there would be consensus that connections/disconnections be either logged unconditionally, or that it should be possible to separate the logging of these higher-level, lower-frequency actions from the lower-level, high-frequency messages (inv, getdata, tx, etc.)

@practicalswift
Copy link
Contributor

practicalswift commented Sep 2, 2020

@jonatack

There are 17 other LogPrintf lines in the same file (net_processing.cpp) alone, many of which are also related to peer connection and disconnection. And just as many more in net.cpp.

If you know of any scenario where a peer connecting to my node can cause my node to write to the default log (cheaply) then I'd be very interested to know about them so that I can fix them.

Assume a node running bitcoind with default settings (-debug not set).

Generally I don't like the idea of untrusted parties being able to cause unwanted resource usage on my machine :)

@jonatack
Copy link
Member

jonatack commented Sep 2, 2020

@practicalswift I understand your concern. Reconnecting from different IP addresses and re-misbehaving may be trivial, but misbehavior leading to discouragement does not automatically lead to immediate disconnection; see src/banman.h::L36-55. If an attacker is wasting our resources in this way, which #19219 aimed to reduce, a line in the log per disconnection seems the least of our worries. This message ends with an exclamation mark for good reason--it's signal, not noise.

@practicalswift
Copy link
Contributor

practicalswift commented Sep 2, 2020

@jonatack

I think @MarcoFalke summarized the current policy quite nicely:

Generally, it should be possible to run Bitcoin Core with well-defined resource usage in an unsupervised setting. Ideally, the resource usage depends on command line settings and user behaviour, not on remote peer (mis)behaviour. And with "unsupervised" I mean that the users doesn't check the debug log unless there is a local software malfunction or other issue that needs manual debugging.

If the suggestion is to change the policy to something else we should probably open a separate issue for that to get more participation :)

The change in this PR is perfectly in line with the current policy AFAICT.

@jonatack
Copy link
Member

jonatack commented Sep 2, 2020

@practicalswift ISTM better solutions have been proposed in this discussion already, this unconditional logging is no oversight, and this can be closed.

@hebasto
Copy link
Member Author

hebasto commented Nov 26, 2020

Closing for now, as ill behavior is no longer observed, at least for me :)

@hebasto hebasto closed this Nov 26, 2020
@ajtowns
Copy link
Contributor

ajtowns commented Dec 4, 2020

On my nodes these logs are less than one line in a thousand, and I'm not banning anything.

Are you accepting inbound connections or hidden behind tor or something? That seems really low. FWIW, I'm seeing ~50% of log lines being "Disconnecting and discouraging" without NET enabled (11.8k out of 22.4k); though that node is a chunk behind current master.

@jonatack
Copy link
Member

jonatack commented Dec 4, 2020

$ ~/.bitcoin$ cat debug.log | wc -l
9156971
$ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
2
$ cat debug.log | grep "discouraging" | wc -l
2
$ grep -i "discourag" ~/.bitcoin/debug.log 
2020-10-08T19:45:15Z Disconnecting and discouraging peer 604!
2020-10-09T17:23:27Z Disconnecting and discouraging peer 288!

Yeah, down to less than one in a million now, but it's a pointless stat, since a couple months been running a tor v3 hidden service with almost no inbounds. 50% seems like a lot though.

I'm thinking of looking at the NET logging to see if we should consider creating a "high level" net logging category (or a "user level" one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I'm not aware of, though.

@jonatack: No objections. I'd take it ever further though, and add an (optional) logging severity (DEBUG/INFO/WARNING/ERROR or similar) that can be added to all log messages. The user can then either choose what severity logs they want for each category (eg = -debug=net:warning,tor:debug etc), or have a logging post-processor that can filter by severity/category.

I'd still like this, may propose something soon ™️

@ajtowns
Copy link
Contributor

ajtowns commented Dec 5, 2020

Yeah; it is a lot -- but it's triggerable by any badly written fake node software that does something stupid and also reconnects to your node sometime after being disconnected. I restarted that node after updating to current master and am back up to 378 lines of debug.log with 161 of them being Disconnecting and discouraging. (no debug logging enabled, which also means I can't tell what IPs those peer ids are related to)

@practicalswift
Copy link
Contributor

Judging from the discussions in this PR is seems to me that the consensus opinion is that "Disconnecting and discouraging peer" is too noisy. Consider re-opening this PR :)

@hebasto hebasto reopened this Dec 5, 2020
@jonatack
Copy link
Member

jonatack commented Dec 5, 2020

Judging from the discussions in this PR is seems to me that the consensus opinion is that "Disconnecting and discouraging peer" is too noisy. Consider re-opening this PR :)

I suspect that re-proposing to hide the symptoms for all users every time one person has a situation with their node might not be the optimal way to address this. "Let's sweep it under the rug."

That said, since this discussion will go in circles until there is a real solution, I'll unsubscribe here and concentrate at some point on proposing a solution for node operators who would like to know about these situations without running and grepping the current NET category logging which is far, far more noisy.

@hebasto
Copy link
Member Author

hebasto commented Dec 5, 2020

Judging from the discussions in this PR is seems to me that the consensus opinion is that "Disconnecting and discouraging peer" is too noisy. Consider re-opening this PR :)

Re-opened.

@practicalswift
Copy link
Contributor

practicalswift commented Dec 5, 2020

ACK 1816327 for the reasons MarcoFalke gave above.

It shouldn't be easy for an attacker to add noise to our logs. -debug can be used in troubleshooting scenarios.

@ajtowns
Copy link
Contributor

ajtowns commented Dec 7, 2020

Having left the restarted node running for a couple of days now, I've broken my logs up by how frequently each message appears.

Standard "everything's going fine" logs:

91 x Bitcoin Core version v0.20.99.0-e21b82438 (release build)  [introductory logs]
387 x UpdateTip: new best=....
49 x New outbound peer connected: ...

Blockchain warnings about people sending me BCH headers and blocks sometimes being slow:

93 x ERROR: AcceptBlockHeader: Consensus::ContextualCheckBlockHeader: ..., bad-diffbits, incorrect proof of work
13 x Potential stale tip detected, will try using extra outbound peer (last tip update: ... seconds ago)

Info about disk allocations/pruning:

100 x Pre-allocating up to position ... in ...
4 x Prune: UnlinkPrunedFiles deleted blk/rev (...)
8 x FlushStateToDisk: write coins cache to disk ... started / completed (...)
4 x Leaving block file ...: CBlockFileInfo(...)

Connection info, pretty much none of which is actionable with the information given:

1114 x  Disconnecting and discouraging peer ...! 
19 x ping timeout: 1200.[...]
16 x socket receive timeout: 1201s
11 x socket send error Broken pipe (32)
5 x Socks5() connect to ....onion:8333 failed: connection refused
5 x socket send error Connection reset by peer (104)
3 x Socks5() connect to ....onion:8333 failed: host unreachable
3 x socket send error Connection timed out (110)

ACK 1816327

@maflcko
Copy link
Member

maflcko commented Dec 7, 2020

I've edited one review (#19832 (comment)) to remove an @ mention, which would be included in the merge commit

@maflcko maflcko merged commit 1a04f45 into bitcoin:master Dec 7, 2020
@hebasto hebasto deleted the 200829-log branch December 7, 2020 08:33
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Dec 7, 2020
…gory

1816327 p2p: Put disconnecting logs into BCLog::NET category (Hennadii Stepanov)

Pull request description:

  It's too noisy:
  ```
  $ cat debug.log | wc -l
  28529
  $ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l
  10177
  ```

ACKs for top commit:
  MarcoFalke:
     noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category, so cr-ACK 1816327
  practicalswift:
    ACK 1816327 for the reasons MarcoFalke gave above.
  ajtowns:
    ACK 1816327

Tree-SHA512: c312c1009090840659b2cb1364d8ad9b6ab8e742fc462aef169996d93c76c248507639a00257ed9d73a6916c01176b1793491b2305e92fdded5f9de0935b6ba6
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants