Skip to content

Conversation

jnewbery
Copy link
Contributor

@jnewbery jnewbery commented Jul 8, 2020

The motivation for this PR is to reduce the scope of cs_main locking in misbehavior logic. It is the first set of commits from a larger branch to move the misbehavior data out of CNodeState and into a new struct that doesn't take cs_main.

There are some very minor behavior changes in this branch, such as:

  • Not checking for discouragement/disconnect in ProcessMessages() (and instead relying on the following check in SendMessages())
  • Checking for discouragement/disconnect as the first action in SendMessages() (and not doing ping message sending first)
  • Continuing through SendMessages() if MaybeDiscourageAndDisconnect() doesn't disconnect the peer (rather than dropping out of SendMessages()

@jnewbery jnewbery changed the title [net processing] Reduce cs_main scope MaybeDiscourageAndDisconnect() [net processing] Reduce cs_main scope in MaybeDiscourageAndDisconnect() Jul 8, 2020
@DrahtBot DrahtBot added the P2P label Jul 8, 2020
@DrahtBot
Copy link
Contributor

DrahtBot commented Jul 8, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@jnewbery jnewbery force-pushed the 2020-07-tidyup-maybediscourage branch from 5722b33 to 37e7265 Compare July 9, 2020 10:38
Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM 37e7265, some comments though.

@jnewbery
Copy link
Contributor Author

jnewbery commented Jul 9, 2020

Thanks for the review @promag . I've addressed your comments.

@jnewbery jnewbery force-pushed the 2020-07-tidyup-maybediscourage branch from 37e7265 to c3dde2b Compare July 9, 2020 14:28
LogPrintf("Warning: not punishing manually-connected peer %s!\n", pnode.addr.ToString());
return false;
}

Copy link
Member

@jonatack jonatack Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whitespace linter needs appeasing, looks like extra whitespace in lines 3589 and 3595.

diff --git a/src/net_processing.cpp b/src/net_processing.cpp

@@ -3573,16 +3582,18 @@ bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode)

+

+

^---- failure generated from test/lint/lint-whitespace.sh

(which won't prevent me from reviewing tomorrow am, so no rush)

@jnewbery jnewbery force-pushed the 2020-07-tidyup-maybediscourage branch from c3dde2b to f54af5e Compare July 9, 2020 16:48
Copy link
Contributor

@dongcarl dongcarl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK f54af5e! Happy to see 2020-06-cs-main-split starting to be PR'd ☺️


For a minute, I was wondering if it'd be appropriate to move the MaybeDiscourageAndDisconnect up to ThreadMessageHandler, then realized that would involve changing NetEventsInterface so nope.

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK f54af5e with a few comments below.

  • Commit message in f54af5e: If we don't disconnect a peer in MaybeDiscourageAndDisconnect because it has NOBAN permissions or it's a manual connection, continue SendMessages processing rather than exiting early -> Was this a bit of a bug? EDIT: the former behavior does not seem to have been an anti-DoS optimisation.

  • Commit message 38c96ef Misbehaving()... is only called in ProcessMessages() -> as it's called from MaybePunishNodeForBlock, MaybePunishNodeForTx, SendBlockTransactions, ProcessHeadersMessage, perhaps write "Misbehaving() calls originate from callers in ProcessMessages()..."

Feel free to ignore the suggestions below.


// If we get here, the outgoing message serialization version is set and can't change.
const CNetMsgMaker msgMaker(pto->GetSendVersion());
if (MaybeDiscourageAndDisconnect(*pto)) return true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

38c96ef in this commit that removes MaybeDiscourageAndDisconnect() from ProcessMessages(), future reviewers may appreciate a comment here in the remaining call to make the order dependency explicit.

e.g. that MaybeDiscourageAndDisconnect() here depends on all misbehavior calls (currently Misbehaving) occurring in ProcessMessages which needs to remain called before SendMessages in CConnman::ThreadMessageHandler... if I am not confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nMisbehavior is a tally in CNodeState that can be incremented from anywhere. That almost always happens inside a ProcessMessages() call (because we increment the misbehavior score when receiving a bad messages from a peer), but not always. See, for example, the call to MaybePunishNodeForBlock() inside BlockChecked(), which is an asynchronous callback from the validation interface, executed on the scheduler thread.

As long as MaybeDiscourageAndDisconnect() is called regularly for the node, then the misbehavior score exceeding the 100 threshold will eventually result in the peer being punished. It doesn't really matter where that MaybeDiscourageAndDisconnect() happens, but it makes most sense in SendMessages() which is where we do general peer housekeeping/maintenance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jnewbery, makes sense.

The previous behavior before this PR had me wondering if there were particular optimisation reasons for it, but I don't see any.

@jnewbery jnewbery force-pushed the 2020-07-tidyup-maybediscourage branch from f54af5e to 738c337 Compare July 10, 2020 12:48
@jnewbery
Copy link
Contributor Author

Thanks for the review @jonatack. I've addressed your inline comments. For your commit log comments:

Commit message in f54af5e: If we don't disconnect a peer in MaybeDiscourageAndDisconnect because it has NOBAN permissions or it's a manual connection, continue SendMessages processing rather than exiting early -> Was this a bit of a bug?

Not a bug, but inefficient and confusing. I've updated the commit log to include that.

Commit message 38c96ef Misbehaving()... is only called in ProcessMessages() -> as it's called from MaybePunishNodeForBlock, MaybePunishNodeForTx, SendBlockTransactions, ProcessHeadersMessage, perhaps write "Misbehaving() calls originate from callers in ProcessMessages()..."

I've now realized that this commit log was incorrect (see #19472 (comment)). I've updated the log to be more accurate. Let me know what you think

@jnewbery jnewbery force-pushed the 2020-07-tidyup-maybediscourage branch from 738c337 to e874c6d Compare July 10, 2020 16:29
@jnewbery
Copy link
Contributor Author

rebased

jnewbery added 2 commits July 10, 2020 18:20
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
Hint for reviewers: review ignoring whitespace changes.
@jnewbery jnewbery force-pushed the 2020-07-tidyup-maybediscourage branch from e874c6d to a37331f Compare July 11, 2020 05:46
…ages

`nMisbehavior` is a tally in `CNodeState` that can be incremented from
anywhere. That almost always happens inside a `ProcessMessages()` call
(because we increment the misbehavior score when receiving a bad
messages from a peer), but not always. See, for example, the call to
`MaybePunishNodeForBlock()` inside `BlockChecked()`, which is an
asynchronous callback from the validation interface, executed on the
scheduler thread.

As long as `MaybeDiscourageAndDisconnect()` is called regularly for the
node, then the misbehavior score exceeding the 100 threshold will
eventually result in the peer being punished. It doesn't really matter
where that `MaybeDiscourageAndDisconnect()` happens, but it makes most
sense in `SendMessages()` which is where we do general peer
housekeeping/maintenance.

Therefore, remove the `MaybeDiscourageAndDisconnect()` call in
`ProcessMessages()` and move the `MaybeDiscourageAndDisconnect()` call
in `SendMessages()` to the top of the function. This moves it out of the
cs_main lock scope, so take that lock directly inside
`MaybeDiscourageAndDisconnect()`.

Historic note: `MaybeDiscourageAndDisconnect()` was previously
`SendRejectsAndCheckIfBanned()`, and before that was just sending
rejects.  All of those things required cs_main, which is why
`MaybeDiscourageAndDisconnect()` was called after the ping logic.
@jnewbery jnewbery force-pushed the 2020-07-tidyup-maybediscourage branch from a37331f to fbb7bab Compare July 11, 2020 06:06
@jonatack
Copy link
Member

re-ACK a37331f per git range-diff 505b4ed f54af5e a37331f

…g peer

If we don't disconnect a peer in MaybeDiscourageAndDisconnect because it
has NOBAN permissions or it's a manual connection, continue SendMessages
processing rather than exiting early.

The previous behaviour was that we'd miss the SendMessages processing on
this iteration of the MessageHandler loop. That's not a problem since
SendMessages() would just be called again on the next iteration, but it
was slightly inefficient and confusing.
@jnewbery jnewbery force-pushed the 2020-07-tidyup-maybediscourage branch from fbb7bab to 655b195 Compare July 11, 2020 06:13
@jonatack
Copy link
Member

re-ACK 655b195 per git range-diff 505b4ed f54af5e 655b195, code/commit messages review, a bit of code history, and debug build.

Maybe (only if you have to retouch again):

 bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode)
 {
-    NodeId peer_id{pnode.GetId()};
+    const NodeId peer_id{pnode.GetId()};

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 655b195 only some style-nits 🚁

Show signature and timestamp

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

ACK 655b195747 only some style-nits 🚁
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUhTIwv/U4oykZ05nHVcYquBiztxhgGyKa9f8uaOypMsTYR6eEWe6L5jcOi7wDYc
0YHHVelm/057+8BOihx6BtYoiStnOzewawMsh2CNXYZPxs9fEJyP5lbTRVLpRN2S
GSwEnXTlrg6N8cZtncmfZRG6XSR9YwwLvS3FXwReTBr9DCE5SPeFz/kUzXin0z2Z
mRG58cYJW4FiYSfRqvd34c3gI8bbJJC9Fz55tc8ZdgXVM9evVeUGQCr1F/cOTbYi
cd0gAkMNbaUuEk3KzO0rK+1iK5oFNo+FCNcU8XG1i1au6tnd+fEBqWOxGJ/TA8VB
m4eI3GHoC4chH4jaOskIYPgSkC5v4ubMV6lImvK3rvud67+5gi/2ZILLxLwjpdCN
KAAKqJTQDGduGgcdt6Tezk1XGmwoJ+CV/sl2dH9INWrEWTsqy9ZQ2sr9uPRPSFva
znujKqFX9MhNkm+tS7XCsRNuJRr3zJN5x4jq9K/tjcCUnfOcCYOJUCmQ2wMpWyBo
2pMOadPJ
=ZRAY
-----END PGP SIGNATURE-----

Timestamp of file with hash cedbe81897bd1c68d1b2c18d961a7b55b239097a710335927096fe5482ada3fd -

Copy link
Contributor

@promag promag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 655b195.

@jnewbery
Copy link
Contributor Author

jnewbery commented Jul 24, 2020

Thanks for the review @MarcoFalke . Rather than invalidate the three ACKs on this branch, I've added a commit to the next branch that addresses your comments: https://github.com/jnewbery/bitcoin/tree/2020-07-tidy-misbehavior.

I think this PR may be ready for merge soon.

@jnewbery
Copy link
Contributor Author

@jonatack

Maybe (only if you have to retouch again):

bool PeerLogicValidation::MaybeDiscourageAndDisconnect(CNode& pnode)
...

Done in follow-up: https://github.com/jnewbery/bitcoin/commits/2020-07-tidy-misbehavior

Copy link

@ariard ariard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review ACK 655b195

@@ -3838,7 +3838,7 @@ class CompareInvMempoolOrder
bool PeerLogicValidation::SendMessages(CNode* pto)
{
const Consensus::Params& consensusParams = Params().GetConsensus();
{

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future work, do we still need cs_sendProcessing ? It only guards m_next_addr_send/m_next_local_addr_send, both of them only RW in SendMessages, inside ThreadMessageHandler.

Is this a relic from the past or do you see another reason ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed it from m_next_addr_send and m_next_local_addr_send in https://github.com/jnewbery/bitcoin/commits/2020-06-cs-main-split. It could theoretically also be removed from guarding SendMessages() since we only ever call that from the message handler thread.

@laanwj laanwj merged commit 40a0481 into bitcoin:master Jul 24, 2020
@jnewbery jnewbery deleted the 2020-07-tidyup-maybediscourage branch July 24, 2020 15:29
@jnewbery
Copy link
Contributor Author

Thanks for review everyone. The next PR is #19583 (including the suggested fixups by @jonatack and @MarcoFalke)

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 24, 2020
…scourageAndDisconnect()

655b195 [net processing] Continue SendMessages processing if not disconnecting peer (John Newbery)
a49781e [net processing] Only call MaybeDiscourageAndDisconnect from SendMessages (John Newbery)
a1d5a42 [net processing] Fix bad indentation in SendMessages() (John Newbery)
1a1c23f [net processing] Change cs_main TRY_LOCK to LOCK in SendMessages() (John Newbery)

Pull request description:

  The motivation for this PR is to reduce the scope of cs_main locking in misbehavior logic. It is the first set of commits from a larger branch to move the misbehavior data out of CNodeState and into a new struct that doesn't take cs_main.

  There are some very minor behavior changes in this branch, such as:

  - Not checking for discouragement/disconnect in `ProcessMessages()` (and instead relying on the following check in `SendMessages()`)
  - Checking for discouragement/disconnect as the first action in `SendMessages()` (and not doing ping message sending first)
  - Continuing through `SendMessages()` if `MaybeDiscourageAndDisconnect()` doesn't disconnect the peer (rather than dropping out of `SendMessages()`

ACKs for top commit:
  jonatack:
    re-ACK 655b195 per `git range-diff 505b4ed f54af5e 655b195`, code/commit messages review, a bit of code history, and debug build.
  MarcoFalke:
    ACK 655b195 only some style-nits 🚁
  promag:
    Code review ACK 655b195.
  ariard:
    Code Review ACK 655b195

Tree-SHA512: fd6d7bc6bb789f5fb7771fb6a45f61a8faba32af93b766554f562144f9631d15c9cc849a383e71743ef73e610b4ee14853666f6fbf08a3ae35176d48c76c65d3
@theuni
Copy link
Member

theuni commented Jul 24, 2020

I'm worried that we may end up spending a more significant amount of time in SendMessages now, whereas before it was opportunistic. Do you have any idea what the TRY_LOCK hit/miss ratio looked like for a busy node over a few days?

I'd also be curious to know what the cpu usage/profile of a busy node looks like before/after this change.

@jnewbery
Copy link
Contributor Author

Thanks for looking at this @theuni!

Do you have any idea what the TRY_LOCK hit/miss ratio looked like for a busy node over a few days?

I don't know exactly but I'd expect it to be very high (ie we almost always take the lock). What other threads are you expecting to be holding cs_main on a regular basis?

@maflcko
Copy link
Member

maflcko commented Jul 25, 2020

I guess this depends on the use case and what other modules are used that might aggressively poll cs_main. (GUI, wallet, indices, RPC, ...)

deadalnix pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Jan 6, 2021
Summary:
```
The motivation for this PR is to reduce the scope of cs_main locking in
misbehavior logic. It is the first set of commits from a larger branch
to move the misbehavior data out of CNodeState and into a new struct
that doesn't take cs_main.

There are some very minor behavior changes in this branch, such as:

    Not checking for discouragement/disconnect in ProcessMessages() (and
instead relying on the following check in SendMessages())
    Checking for discouragement/disconnect as the first action in
SendMessages() (and not doing ping message sending first)
    Continuing through SendMessages() if MaybeDiscourageAndDisconnect()
doesn't disconnect the peer (rather than dropping out of SendMessages()
```

Backport of core [[bitcoin/bitcoin#19472 | PR19472]].

Depends on D8782 and D8791.

Test Plan:
With Clang and debug:
  ninja all check-all

  ./contrib/teamcity/build-configurations.py build-tsan

Reviewers: #bitcoin_abc, majcosta

Reviewed By: #bitcoin_abc, majcosta

Subscribers: majcosta

Differential Revision: https://reviews.bitcoinabc.org/D8792
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants