-
Notifications
You must be signed in to change notification settings - Fork 37.7k
p2p: Put disconnecting logs into BCLog::NET category #19832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category, so cr-ACK 1816327 |
Tend to NACK, the |
These figures are very far from representative? I have 154230 and 116, e.g. less than one line in one thousand, and with no banning, just one seednode and a few addnodes in the config file. |
Interesting, what factors affect this? FWIW, that is the node the log was grepped from: |
This PR does not change the logging output amount with the |
I think it should stay unconditionally logged. If one-third of a node's debug log is this message, ISTM the node has issues and this should be visibly logged. |
What kind of the node's issues could force it to discourage and disconnect peers? |
Misbehaving peers; look at the callers of |
I mean "peers do misbehave", not "the node has issues". |
The node having issues because it's seeing a pathological amount of misbehavior. Best to fix the issue, not hide the logs. |
@jonatack What if the peer is doing this deliberately to try to fill our logs with crap? Either to cause disk usage, or to bury warnings about potential real issues (such as "Potential stale tip detected …") by increasing signal to noise. Generally I don't think misbehaving peers should be able to cause us to write to disk unless the user is explicitly opting in to debug logging. |
I'd argue that this is a real issue. One-third of all logging for misbehavior should not be happening, and if the case is truly as extreme as described at the top, then alarms should be going off; better to know about it ASAP and take useful measures (setban, addnode, seednode, etc). The net logging is far too noisy to be useful unless you are specifically searching it for this or have some kind of custom, up-to-date-on-master monitoring system in place. I don't think it's reasonable to change the logging for everyone based on an unusually pathological case. |
(If people don't want to see peer misbehavior unconditionally logged, maybe we should add a distinct category of |
If so then I think it is better is we log when the "alarm should go off" condition is fulfilled: not unconditionally log totally normal events, hope that the user notices that these normal events are "more normal than normally" and that he/she draws the conclusion that something could be wrong. I don't think that is good usability: evidently also long term Core contributors don't follow the described pattern as proven by this PR :) Suggestion: we move forward with @hebasto's suggestion and a follow-up PR can make sure the alarm goes off in a clear and unambiguous way when there is indication that something really is broken in a way that may require end-user attention/action :) |
This PR is made based on only only one data point, and afaict an extremely unusual one. Is anyone else really seeing this much misbehavior logging? |
@@ -3828,7 +3828,7 @@ bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode) | |||
} | |||
|
|||
// Normal case: Disconnect the peer and discourage all nodes sharing the address | |||
LogPrintf("Disconnecting and discouraging peer %d!\n", peer_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so then I think it is better is we log when the "alarm should go off" condition is fulfilled: not unconditionally log totally normal events, hope that the user notices that these normal events are "more normal than normally" and that he/she draws the conclusion that something could be wrong.
I don't think that is good usability: evidently also long term Core contributors don't follow the described pattern as proven by this PR :)
This line was written by @sipa in b691f2d and last touched by @jnewbery in 655b195.
My point was simply that Core contributor hebasto who by definition is a power-user didn't interpret the "alarm should go off" logging as such in this case (and TBH I'm not certain he should have). I think it is unreasonable to expect that ordinary users should do that interpretation even if we assume this would be a "alarm should go off" situation.
If this is something we want to warn about we should do it in a more clear and unambiguous way :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. It's hard to know--I think I'd be more worried about network topology here than a peer causing lots of logging, but I'm not sure if that POV is correct or shared.
I'm thinking of looking at the NET
logging to see if we should consider creating a "high level" net logging category (or a "user level" one) as opposed to the low-level one with all of the invs and other hundreds/thousands of messages per minute. There may be objections to that which I'm not aware of, though.
Logging here was added way back in b2864d2, shortly after logging categories were added (#3009). That commit also added unconditional logging to
@MarcoFalke: I don't understand. Can you explain why they're different?
@jonatack: No objections. I'd take it ever further though, and add an (optional) logging severity (DEBUG/INFO/WARNING/ERROR or similar) that can be added to all log messages. The user can then either choose what severity logs they want for each category (eg = |
ACK 1816327 -- I think these logs belong in the |
I think there is something dramatically wrong with a node that has this message as one-third of its logging. It shouldn't be a motivation for this change. |
On my nodes these logs are less than one line in a thousand, and I'm not banning anything. |
I assume that noban peers are "trusted" peers hand-picked by the node operator, so they should be assumed to not misbehave. If they do and fill your disk, then it is user error and not Bitcoin Core error. |
Generally, it should be possible to run Bitcoin Core with well-defined resource usage in an unsupervised setting. Ideally, the resource usage depends on command line settings and user behaviour, not on remote peer (mis)behaviour. And with "unsupervised" I mean that the users doesn't check the debug log unless there is a local software malfunction or other issue that needs manual debugging. Power users that understand internal Bitcoin Core logic and wish to parse the log for peers to ban, can enable the debug category and use I understand that the |
There are 17 other I'd propose to have a global policy on connection/disconnection logging rather than PRing changes to each of them one-by-one. If peer connection messages are displayed unconditionally, disconnections arguably should be as well--and many are. It seems there would be consensus that connections/disconnections be either logged unconditionally, or that it should be possible to separate the logging of these higher-level, lower-frequency actions from the lower-level, high-frequency messages (inv, getdata, tx, etc.) |
If you know of any scenario where a peer connecting to my node can cause my node to write to the default log (cheaply) then I'd be very interested to know about them so that I can fix them. Assume a node running Generally I don't like the idea of untrusted parties being able to cause unwanted resource usage on my machine :) |
@practicalswift I understand your concern. Reconnecting from different IP addresses and re-misbehaving may be trivial, but misbehavior leading to discouragement does not automatically lead to immediate disconnection; see |
I think @MarcoFalke summarized the current policy quite nicely:
If the suggestion is to change the policy to something else we should probably open a separate issue for that to get more participation :) The change in this PR is perfectly in line with the current policy AFAICT. |
@practicalswift ISTM better solutions have been proposed in this discussion already, this unconditional logging is no oversight, and this can be closed. |
Closing for now, as ill behavior is no longer observed, at least for me :) |
Are you accepting inbound connections or hidden behind tor or something? That seems really low. FWIW, I'm seeing ~50% of log lines being "Disconnecting and discouraging" without NET enabled (11.8k out of 22.4k); though that node is a chunk behind current master. |
Yeah, down to less than one in a million now, but it's a pointless stat, since a couple months been running a tor v3 hidden service with almost no inbounds. 50% seems like a lot though.
I'd still like this, may propose something soon ™️ |
Yeah; it is a lot -- but it's triggerable by any badly written fake node software that does something stupid and also reconnects to your node sometime after being disconnected. I restarted that node after updating to current master and am back up to 378 lines of debug.log with 161 of them being Disconnecting and discouraging. (no debug logging enabled, which also means I can't tell what IPs those peer ids are related to) |
Judging from the discussions in this PR is seems to me that the consensus opinion is that "Disconnecting and discouraging peer" is too noisy. Consider re-opening this PR :) |
I suspect that re-proposing to hide the symptoms for all users every time one person has a situation with their node might not be the optimal way to address this. "Let's sweep it under the rug." That said, since this discussion will go in circles until there is a real solution, I'll unsubscribe here and concentrate at some point on proposing a solution for node operators who would like to know about these situations without running and grepping the current NET category logging which is far, far more noisy. |
Re-opened. |
ACK 1816327 for the reasons MarcoFalke gave above. It shouldn't be easy for an attacker to add noise to our logs. |
Having left the restarted node running for a couple of days now, I've broken my logs up by how frequently each message appears. Standard "everything's going fine" logs:
Blockchain warnings about people sending me BCH headers and blocks sometimes being slow:
Info about disk allocations/pruning:
Connection info, pretty much none of which is actionable with the information given:
ACK 1816327 |
I've edited one review (#19832 (comment)) to remove an |
…gory 1816327 p2p: Put disconnecting logs into BCLog::NET category (Hennadii Stepanov) Pull request description: It's too noisy: ``` $ cat debug.log | wc -l 28529 $ cat debug.log | grep "Disconnecting and discouraging peer" | wc -l 10177 ``` ACKs for top commit: MarcoFalke: noban, addnode and local peers are still unconditionally logged (as they should), but this one can go into a category, so cr-ACK 1816327 practicalswift: ACK 1816327 for the reasons MarcoFalke gave above. ajtowns: ACK 1816327 Tree-SHA512: c312c1009090840659b2cb1364d8ad9b6ab8e742fc462aef169996d93c76c248507639a00257ed9d73a6916c01176b1793491b2305e92fdded5f9de0935b6ba6
It's too noisy: