-
Notifications
You must be signed in to change notification settings - Fork 37.7k
[net processing] Reduce cs_main scope in MaybeDiscourageAndDisconnect() #19472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[net processing] Reduce cs_main scope in MaybeDiscourageAndDisconnect() #19472
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
f00125a
to
5722b33
Compare
5722b33
to
37e7265
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM 37e7265, some comments though.
Thanks for the review @promag . I've addressed your comments. |
37e7265
to
c3dde2b
Compare
src/net_processing.cpp
Outdated
LogPrintf("Warning: not punishing manually-connected peer %s!\n", pnode.addr.ToString()); | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whitespace linter needs appeasing, looks like extra whitespace in lines 3589 and 3595.
diff --git a/src/net_processing.cpp b/src/net_processing.cpp
@@ -3573,16 +3582,18 @@ bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode)
+
+
^---- failure generated from test/lint/lint-whitespace.sh
(which won't prevent me from reviewing tomorrow am, so no rush)
c3dde2b
to
f54af5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK f54af5e! Happy to see 2020-06-cs-main-split
starting to be PR'd
For a minute, I was wondering if it'd be appropriate to move the MaybeDiscourageAndDisconnect
up to ThreadMessageHandler
, then realized that would involve changing NetEventsInterface
so nope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK f54af5e with a few comments below.
-
Commit message in f54af5e: If we don't disconnect a peer in MaybeDiscourageAndDisconnect because it has NOBAN permissions or it's a manual connection, continue SendMessages processing rather than exiting early -> Was this a bit of a bug? EDIT: the former behavior does not seem to have been an anti-DoS optimisation.
-
Commit message 38c96ef Misbehaving()... is only called in ProcessMessages() -> as it's called from MaybePunishNodeForBlock, MaybePunishNodeForTx, SendBlockTransactions, ProcessHeadersMessage, perhaps write "Misbehaving() calls originate from callers in ProcessMessages()..."
Feel free to ignore the suggestions below.
|
||
// If we get here, the outgoing message serialization version is set and can't change. | ||
const CNetMsgMaker msgMaker(pto->GetSendVersion()); | ||
if (MaybeDiscourageAndDisconnect(*pto)) return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
38c96ef in this commit that removes MaybeDiscourageAndDisconnect()
from ProcessMessages()
, future reviewers may appreciate a comment here in the remaining call to make the order dependency explicit.
e.g. that MaybeDiscourageAndDisconnect()
here depends on all misbehavior calls (currently Misbehaving
) occurring in ProcessMessages which needs to remain called before SendMessages in CConnman::ThreadMessageHandler
... if I am not confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nMisbehavior
is a tally in CNodeState
that can be incremented from anywhere. That almost always happens inside a ProcessMessages()
call (because we increment the misbehavior score when receiving a bad messages from a peer), but not always. See, for example, the call to MaybePunishNodeForBlock()
inside BlockChecked()
, which is an asynchronous callback from the validation interface, executed on the scheduler thread.
As long as MaybeDiscourageAndDisconnect()
is called regularly for the node, then the misbehavior score exceeding the 100 threshold will eventually result in the peer being punished. It doesn't really matter where that MaybeDiscourageAndDisconnect()
happens, but it makes most sense in SendMessages()
which is where we do general peer housekeeping/maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jnewbery, makes sense.
The previous behavior before this PR had me wondering if there were particular optimisation reasons for it, but I don't see any.
f54af5e
to
738c337
Compare
Thanks for the review @jonatack. I've addressed your inline comments. For your commit log comments:
Not a bug, but inefficient and confusing. I've updated the commit log to include that.
I've now realized that this commit log was incorrect (see #19472 (comment)). I've updated the log to be more accurate. Let me know what you think |
738c337
to
e874c6d
Compare
rebased |
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
Hint for reviewers: review ignoring whitespace changes.
e874c6d
to
a37331f
Compare
…ages `nMisbehavior` is a tally in `CNodeState` that can be incremented from anywhere. That almost always happens inside a `ProcessMessages()` call (because we increment the misbehavior score when receiving a bad messages from a peer), but not always. See, for example, the call to `MaybePunishNodeForBlock()` inside `BlockChecked()`, which is an asynchronous callback from the validation interface, executed on the scheduler thread. As long as `MaybeDiscourageAndDisconnect()` is called regularly for the node, then the misbehavior score exceeding the 100 threshold will eventually result in the peer being punished. It doesn't really matter where that `MaybeDiscourageAndDisconnect()` happens, but it makes most sense in `SendMessages()` which is where we do general peer housekeeping/maintenance. Therefore, remove the `MaybeDiscourageAndDisconnect()` call in `ProcessMessages()` and move the `MaybeDiscourageAndDisconnect()` call in `SendMessages()` to the top of the function. This moves it out of the cs_main lock scope, so take that lock directly inside `MaybeDiscourageAndDisconnect()`. Historic note: `MaybeDiscourageAndDisconnect()` was previously `SendRejectsAndCheckIfBanned()`, and before that was just sending rejects. All of those things required cs_main, which is why `MaybeDiscourageAndDisconnect()` was called after the ping logic.
a37331f
to
fbb7bab
Compare
re-ACK a37331f per |
…g peer If we don't disconnect a peer in MaybeDiscourageAndDisconnect because it has NOBAN permissions or it's a manual connection, continue SendMessages processing rather than exiting early. The previous behaviour was that we'd miss the SendMessages processing on this iteration of the MessageHandler loop. That's not a problem since SendMessages() would just be called again on the next iteration, but it was slightly inefficient and confusing.
fbb7bab
to
655b195
Compare
re-ACK 655b195 per Maybe (only if you have to retouch again): bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode)
{
- NodeId peer_id{pnode.GetId()};
+ const NodeId peer_id{pnode.GetId()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 655b195 only some style-nits 🚁
Show signature and timestamp
Signature:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
ACK 655b195747 only some style-nits 🚁
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUhTIwv/U4oykZ05nHVcYquBiztxhgGyKa9f8uaOypMsTYR6eEWe6L5jcOi7wDYc
0YHHVelm/057+8BOihx6BtYoiStnOzewawMsh2CNXYZPxs9fEJyP5lbTRVLpRN2S
GSwEnXTlrg6N8cZtncmfZRG6XSR9YwwLvS3FXwReTBr9DCE5SPeFz/kUzXin0z2Z
mRG58cYJW4FiYSfRqvd34c3gI8bbJJC9Fz55tc8ZdgXVM9evVeUGQCr1F/cOTbYi
cd0gAkMNbaUuEk3KzO0rK+1iK5oFNo+FCNcU8XG1i1au6tnd+fEBqWOxGJ/TA8VB
m4eI3GHoC4chH4jaOskIYPgSkC5v4ubMV6lImvK3rvud67+5gi/2ZILLxLwjpdCN
KAAKqJTQDGduGgcdt6Tezk1XGmwoJ+CV/sl2dH9INWrEWTsqy9ZQ2sr9uPRPSFva
znujKqFX9MhNkm+tS7XCsRNuJRr3zJN5x4jq9K/tjcCUnfOcCYOJUCmQ2wMpWyBo
2pMOadPJ
=ZRAY
-----END PGP SIGNATURE-----
Timestamp of file with hash cedbe81897bd1c68d1b2c18d961a7b55b239097a710335927096fe5482ada3fd -
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 655b195.
Thanks for the review @MarcoFalke . Rather than invalidate the three ACKs on this branch, I've added a commit to the next branch that addresses your comments: https://github.com/jnewbery/bitcoin/tree/2020-07-tidy-misbehavior. I think this PR may be ready for merge soon. |
Done in follow-up: https://github.com/jnewbery/bitcoin/commits/2020-07-tidy-misbehavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review ACK 655b195
@@ -3838,7 +3838,7 @@ class CompareInvMempoolOrder | |||
bool PeerLogicValidation::SendMessages(CNode* pto) | |||
{ | |||
const Consensus::Params& consensusParams = Params().GetConsensus(); | |||
{ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future work, do we still need cs_sendProcessing
? It only guards m_next_addr_send/m_next_local_addr_send
, both of them only RW in SendMessages
, inside ThreadMessageHandler
.
Is this a relic from the past or do you see another reason ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed it from m_next_addr_send
and m_next_local_addr_send
in https://github.com/jnewbery/bitcoin/commits/2020-06-cs-main-split. It could theoretically also be removed from guarding SendMessages()
since we only ever call that from the message handler thread.
Thanks for review everyone. The next PR is #19583 (including the suggested fixups by @jonatack and @MarcoFalke) |
…scourageAndDisconnect() 655b195 [net processing] Continue SendMessages processing if not disconnecting peer (John Newbery) a49781e [net processing] Only call MaybeDiscourageAndDisconnect from SendMessages (John Newbery) a1d5a42 [net processing] Fix bad indentation in SendMessages() (John Newbery) 1a1c23f [net processing] Change cs_main TRY_LOCK to LOCK in SendMessages() (John Newbery) Pull request description: The motivation for this PR is to reduce the scope of cs_main locking in misbehavior logic. It is the first set of commits from a larger branch to move the misbehavior data out of CNodeState and into a new struct that doesn't take cs_main. There are some very minor behavior changes in this branch, such as: - Not checking for discouragement/disconnect in `ProcessMessages()` (and instead relying on the following check in `SendMessages()`) - Checking for discouragement/disconnect as the first action in `SendMessages()` (and not doing ping message sending first) - Continuing through `SendMessages()` if `MaybeDiscourageAndDisconnect()` doesn't disconnect the peer (rather than dropping out of `SendMessages()` ACKs for top commit: jonatack: re-ACK 655b195 per `git range-diff 505b4ed f54af5e 655b195`, code/commit messages review, a bit of code history, and debug build. MarcoFalke: ACK 655b195 only some style-nits 🚁 promag: Code review ACK 655b195. ariard: Code Review ACK 655b195 Tree-SHA512: fd6d7bc6bb789f5fb7771fb6a45f61a8faba32af93b766554f562144f9631d15c9cc849a383e71743ef73e610b4ee14853666f6fbf08a3ae35176d48c76c65d3
I'm worried that we may end up spending a more significant amount of time in SendMessages now, whereas before it was opportunistic. Do you have any idea what the TRY_LOCK hit/miss ratio looked like for a busy node over a few days? I'd also be curious to know what the cpu usage/profile of a busy node looks like before/after this change. |
Thanks for looking at this @theuni!
I don't know exactly but I'd expect it to be very high (ie we almost always take the lock). What other threads are you expecting to be holding cs_main on a regular basis? |
I guess this depends on the use case and what other modules are used that might aggressively poll cs_main. (GUI, wallet, indices, RPC, ...) |
Summary: ``` The motivation for this PR is to reduce the scope of cs_main locking in misbehavior logic. It is the first set of commits from a larger branch to move the misbehavior data out of CNodeState and into a new struct that doesn't take cs_main. There are some very minor behavior changes in this branch, such as: Not checking for discouragement/disconnect in ProcessMessages() (and instead relying on the following check in SendMessages()) Checking for discouragement/disconnect as the first action in SendMessages() (and not doing ping message sending first) Continuing through SendMessages() if MaybeDiscourageAndDisconnect() doesn't disconnect the peer (rather than dropping out of SendMessages() ``` Backport of core [[bitcoin/bitcoin#19472 | PR19472]]. Depends on D8782 and D8791. Test Plan: With Clang and debug: ninja all check-all ./contrib/teamcity/build-configurations.py build-tsan Reviewers: #bitcoin_abc, majcosta Reviewed By: #bitcoin_abc, majcosta Subscribers: majcosta Differential Revision: https://reviews.bitcoinabc.org/D8792
The motivation for this PR is to reduce the scope of cs_main locking in misbehavior logic. It is the first set of commits from a larger branch to move the misbehavior data out of CNodeState and into a new struct that doesn't take cs_main.
There are some very minor behavior changes in this branch, such as:
ProcessMessages()
(and instead relying on the following check inSendMessages()
)SendMessages()
(and not doing ping message sending first)SendMessages()
ifMaybeDiscourageAndDisconnect()
doesn't disconnect the peer (rather than dropping out ofSendMessages()