-
Notifications
You must be signed in to change notification settings - Fork 37.8k
Split CNode::cs_vSend: message processing and message sending #9535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note that this commit was pulled out of #9419 as it stands alone and is by itself a big win. |
@@ -1887,7 +1883,7 @@ void CConnman::ThreadMessageHandler() | |||
|
|||
// Send messages | |||
{ | |||
TRY_LOCK(pnode->cs_vSend, lockSend); | |||
TRY_LOCK(pnode->cs_sendProcessing, lockSend); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the TRY_LOCK here. Death to 'em!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
utACK 11290734ca7261f732c9f43f152c69de3a42546c, I ran a nearly identical version for over a week with no issues. Shameless beg/plea for 0.14. This was originally intended for #9441, but left out because #9419 (which has since been dropped for 0.14) already included it. I understand that we all have deep review queues atm though, not everything can make it. |
cs_vSend is used for two purposes - to lock the datastructures used to queue messages to place on the wire and to only call SendMessages once at a time per-node. I believe SendMessages used to access some of the vSendMsg stuff, but it doesn't anymore, so these locks do not need to be on the same mutex, and also make deadlocking much more likely.
Technically cs_sendProcessing is entirely useless now because it is only ever taken on the one MessageHandler thread, but because there may be multiple of those in the future, it is left in place
1129073
to
376b3c2
Compare
Rebased after #9441 merge. |
utACK 376b3c2. Happy to see TRY_LOCKs go. |
lock pleb utACK 376b3c2 |
utACK 376b3c2 similar to @instagibbs, this is relatively new code to me, but it seems like it makes sense |
LOCK(pnode->cs_vSend); | ||
size_t nBytes = SocketSendData(pnode); | ||
if (nBytes) { | ||
RecordBytesSent(nBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RecordBytesSent() needs to be called from inside the lock? If so, I assume the same would be needed in CConnman::PushMessage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtimon No, RecordBytesSent does not require the lock. It has its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checked, it looks like every variable accessed in RecordBytesSent is only done with cs_totalBytesSent (and even if this weren't true, this change wouldnt be a regression, we still call PushMessage from other functions without cs_vSend).
I was told by @TheBlueMatt and @theuni:
This is what my "manual review what's accessed under the locks before/after" contains at this point:
...so far the method seems to be working, but would be nice to hear a confirmation that that's what I was told, to confirm that I'm not making any wrong assumption. Only SendMessages() left and already listed functions/methods to review, but it's long, the tempation of doing a reduced version of #4646 always come back whenever asked to review SendMessages()...sorry. At this point I doubt I will make my review before 0.14 is forked, but I'm fully motivated to finish it before or after it is merged. @theuni can you detail the |
@jtimon looks good. Re #4646, something similar is definitely the plan. See theuni@1a6b10a. That's definitely the next step, it was just too much of a refactor for 0.14. As for the non-recursive part, I simply meant that it should now be quite easy to spot potential deadlocks since cs_vSend is now limited enough in scope that it can't recurse. In other words, the lock never leaves PushMessage/SocketSendData/RecordBytesSent (where RecordBytesSent isn't necessary and could be avoided). |
Alright, going through all pnode->cs_vSend was easy, but after going though all SendMessages (I updated the list in my comment above), looking only at the pto parameter for the first pass (which is the most relevant for this pr) and not being exhaustive about all potential globals and not going recursively though connman.PushMessage() IsInitialBlockDownload(), Broadcast(nTimeBestReceived, &connman), pto->PushInventory(), pto->pfilter->IsRelevantAndUpdate()...the only "conflict" so far is cs_inventory, but it's just a finer smaller lock, so seems fine as well. |
utACK 376b3c2 |
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
b954796 include missing atomic to make CMake linux happy. (furszy) c452676 Make the cs_sendProcessing a LOCK instead of a TRY_LOCK (Matt Corallo) ec23964 Split CNode::cs_vSend: message processing and message sending (Matt Corallo) f9c458a Remove double brackets in addrman (Matt Corallo) e48c0d3 Fix AddrMan locking (Matt Corallo) 2d71f05 Make fDisconnect an std::atomic (Matt Corallo) 0544cc6 net: fix a few cases where messages were sent rather than dropped upon disconnection (furszy) Pull request description: Running 4.2 branch with `enable-debug`. Fixing: ``` POTENTIAL DEADLOCK DETECTED Previous lock order was: pnode->cs_vRecvMsg net.cpp:1830 (TRY) (in thread ) (1) cs_main main.cpp:5275 (in thread ) (2) cs_vSend net.cpp:2465 (in thread ) Current lock order is: (2) pnode->cs_vSend net.cpp:1846 (TRY) (in thread ) (1) cs_main main.cpp:6018 (TRY) (in thread ) ``` Plus, added few more minor possible races fixes too. Back porting: bitcoin#8862 bitcoin#9225 bitcoin#9535 ACKs for top commit: Fuzzbawls: ACK b954796 random-zebra: ACK b954796 Tree-SHA512: e15bd81e51282771ca4cb40689229d54787d72d7ed0ebc5f6f0058d3e252cc6691d8ebfc4670207bcd835d32b7898863651fd2ac74704d6cb4fdd30554d8a83a
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock between cs_main and cs_vSend. cs_vSend was split into cs_vSend and cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a TRY_LOCK to a LOCK in the same PR). Since cs_vSend can no longer be taken before cs_main, revert this to a LOCK(). This commit leaves part of the code with bad indentation. That is fixed by the next (whitespace change only) commit.
cs_vSend is used for two purposes - to lock the datastructures used
to queue messages to place on the wire and to only call
SendMessages once at a time per-node. I believe SendMessages used
to access some of the vSendMsg stuff, but it doesn't anymore, so
these locks do not need to be on the same mutex, and also make
deadlocking much more likely.