Skip to content

Conversation

TheBlueMatt
Copy link
Contributor

cs_vSend is used for two purposes - to lock the datastructures used
to queue messages to place on the wire and to only call
SendMessages once at a time per-node. I believe SendMessages used
to access some of the vSendMsg stuff, but it doesn't anymore, so
these locks do not need to be on the same mutex, and also make
deadlocking much more likely.

@TheBlueMatt
Copy link
Contributor Author

Note that this commit was pulled out of #9419 as it stands alone and is by itself a big win.

@fanquake fanquake added the P2P label Jan 12, 2017
@@ -1887,7 +1883,7 @@ void CConnman::ThreadMessageHandler()

// Send messages
{
TRY_LOCK(pnode->cs_vSend, lockSend);
TRY_LOCK(pnode->cs_sendProcessing, lockSend);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the TRY_LOCK here. Death to 'em!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@theuni
Copy link
Member

theuni commented Jan 13, 2017

utACK 11290734ca7261f732c9f43f152c69de3a42546c, I ran a nearly identical version for over a week with no issues.

Shameless beg/plea for 0.14. This was originally intended for #9441, but left out because #9419 (which has since been dropped for 0.14) already included it. I understand that we all have deep review queues atm though, not everything can make it.

cs_vSend is used for two purposes - to lock the datastructures used
to queue messages to place on the wire and to only call
SendMessages once at a time per-node. I believe SendMessages used
to access some of the vSendMsg stuff, but it doesn't anymore, so
these locks do not need to be on the same mutex, and also make
deadlocking much more likely.
Technically cs_sendProcessing is entirely useless now because it
is only ever taken on the one MessageHandler thread, but because
there may be multiple of those in the future, it is left in place
@TheBlueMatt TheBlueMatt force-pushed the 2017-01-cs-vsend-split branch from 1129073 to 376b3c2 Compare January 13, 2017 18:34
@TheBlueMatt
Copy link
Contributor Author

Rebased after #9441 merge.

@sipa
Copy link
Member

sipa commented Jan 14, 2017

utACK 376b3c2. Happy to see TRY_LOCKs go.

@instagibbs
Copy link
Member

lock pleb utACK 376b3c2

@morcos
Copy link
Contributor

morcos commented Jan 17, 2017

utACK 376b3c2

similar to @instagibbs, this is relatively new code to me, but it seems like it makes sense

LOCK(pnode->cs_vSend);
size_t nBytes = SocketSendData(pnode);
if (nBytes) {
RecordBytesSent(nBytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RecordBytesSent() needs to be called from inside the lock? If so, I assume the same would be needed in CConnman::PushMessage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtimon No, RecordBytesSent does not require the lock. It has its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checked, it looks like every variable accessed in RecordBytesSent is only done with cs_totalBytesSent (and even if this weren't true, this change wouldnt be a regression, we still call PushMessage from other functions without cs_vSend).

@jtimon
Copy link
Contributor

jtimon commented Jan 18, 2017

I was told by @TheBlueMatt and @theuni:

<BlueMatt> well #9535 is just a lock split, so just going through each variable that is accessed in one side and making sure its not accessed on the other is a pretty good (though admittedly not 100% sufficient) review
<BlueMatt> and one one of the sides in this case there are relatively few variables accessed, so its not so hard
<cfields> BlueMatt: yep, will do after dinner
<cfields> jtimon: yea, it's sane, just needs manual review of what's accessed under the locks before/after
<cfields> (note that you can treat it as non-recursive, so very simple)

This is what my "manual review what's accessed under the locks before/after" contains at this point:

****** Variables accessed while LOCK(pnode->cs_vSend) or TRY_LOCK(pnode->cs_vSend, lockSend);
pnode->vSendMsg
pnode->nSendOffset
pnode->hSocket
pnode->nLastSend
pnode->nSendBytes
pnode->nSendSize
pnode->fPauseSend
CConnman::nSendBufferMaxSize
pnode->mapSendBytesPerMsgCmd
pnode->cs_inventory
****** Variables accessed while LOCK(pnode->cs_sendProcessing)
GetNodeSignals().SendMessages(pnode, *this, flagInterruptMsgProc);
pto->nVersion
pto->fDisconnect
pto->GetSendVersion()
pto->fPingQueued
pto->nPingNonceSent
pto->nPingUsecStart
connman.PushMessage(pto, msgMaker.Make(NetMsgType::PING, nonce));
pto->nPingNonceSent
pto->GetId()
pto->fWhitelisted
pto->fAddnode
pto->addr
AdvertiseLocal(pto
pto->nNextLocalAddrSend
pto->nNextAddrSend
pto->vAddrToSend
pto->fClient
pto->fOneShot
pto->id
pto->nStartingHeight
Reindex && !fImporting && !IsInitialBlockDownload()
GetMainSignals().Broadcast(nTimeBestReceived, &connman)
pto->cs_inventory
pto->vBlockHashesToAnnounce
pto->PushInventory(CInv(MSG_BLOCK, hashToAnnounce));
pto->vInventoryBlockToSend
pto->nNextInvSend
pto->fInbound
pto->cs_filter
pto->fRelayTxes
pto->fSendMempool
pto->cs_feeFilter
pto->minFeeFilter
pto->pfilter
pto->pfilter->IsRelevantAndUpdate(*txinfo.tx)
pto->filterInventoryKnown
pto->timeLastMempoolReq
GetFetchFlags(pto, pindex->pprev, consensusParams)
pto->mapAskFor
pto->nextSendTimeFeeFilter

...so far the method seems to be working, but would be nice to hear a confirmation that that's what I was told, to confirm that I'm not making any wrong assumption. Only SendMessages() left and already listed functions/methods to review, but it's long, the tempation of doing a reduced version of #4646 always come back whenever asked to review SendMessages()...sorry.

At this point I doubt I will make my review before 0.14 is forked, but I'm fully motivated to finish it before or after it is merged.

@theuni can you detail the note that you can treat it as non-recursive part?

@theuni
Copy link
Member

theuni commented Jan 18, 2017

@jtimon looks good.

Re #4646, something similar is definitely the plan. See theuni@1a6b10a. That's definitely the next step, it was just too much of a refactor for 0.14.

As for the non-recursive part, I simply meant that it should now be quite easy to spot potential deadlocks since cs_vSend is now limited enough in scope that it can't recurse. In other words, the lock never leaves PushMessage/SocketSendData/RecordBytesSent (where RecordBytesSent isn't necessary and could be avoided).

@jtimon
Copy link
Contributor

jtimon commented Jan 19, 2017

Alright, going through all pnode->cs_vSend was easy, but after going though all SendMessages (I updated the list in my comment above), looking only at the pto parameter for the first pass (which is the most relevant for this pr) and not being exhaustive about all potential globals and not going recursively though connman.PushMessage() IsInitialBlockDownload(), Broadcast(nTimeBestReceived, &connman), pto->PushInventory(), pto->pfilter->IsRelevantAndUpdate()...the only "conflict" so far is cs_inventory, but it's just a finer smaller lock, so seems fine as well.
I'm very sorry about not being able to give a full utACK for this one, I feel my ACK is worth way more than the simple concept ACK I was planning and I thank @TheBlueMatt and @theuni for pushing me to review further when I warned I'm not so familiar with the networking code and its concurrency.
Slow review ACK.

@jonasschnelli
Copy link
Contributor

utACK 376b3c2

@laanwj laanwj added this to the 0.14.0 milestone Jan 19, 2017
@laanwj laanwj merged commit 376b3c2 into bitcoin:master Jan 19, 2017
laanwj added a commit that referenced this pull request Jan 19, 2017
…nding

376b3c2 Make the cs_sendProcessing a LOCK instead of a TRY_LOCK (Matt Corallo)
d7c58ad Split CNode::cs_vSend: message processing and message sending (Matt Corallo)
codablock pushed a commit to codablock/dash that referenced this pull request Jan 21, 2018
…sage sending

376b3c2 Make the cs_sendProcessing a LOCK instead of a TRY_LOCK (Matt Corallo)
d7c58ad Split CNode::cs_vSend: message processing and message sending (Matt Corallo)
andvgal pushed a commit to energicryptocurrency/gen2-energi that referenced this pull request Jan 6, 2019
…sage sending

376b3c2 Make the cs_sendProcessing a LOCK instead of a TRY_LOCK (Matt Corallo)
d7c58ad Split CNode::cs_vSend: message processing and message sending (Matt Corallo)
CryptoCentric pushed a commit to absolute-community/absolute that referenced this pull request Feb 27, 2019
…sage sending

376b3c2 Make the cs_sendProcessing a LOCK instead of a TRY_LOCK (Matt Corallo)
d7c58ad Split CNode::cs_vSend: message processing and message sending (Matt Corallo)
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jun 15, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jun 18, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jun 18, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jun 19, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jun 19, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jul 7, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jul 8, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jul 8, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jul 8, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jul 10, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jul 11, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
jnewbery added a commit to jnewbery/bitcoin that referenced this pull request Jul 24, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
random-zebra added a commit to PIVX-Project/PIVX that referenced this pull request Aug 2, 2020
b954796 include missing atomic to make CMake linux happy. (furszy)
c452676 Make the cs_sendProcessing a LOCK instead of a TRY_LOCK (Matt Corallo)
ec23964 Split CNode::cs_vSend: message processing and message sending (Matt Corallo)
f9c458a Remove double brackets in addrman (Matt Corallo)
e48c0d3 Fix AddrMan locking (Matt Corallo)
2d71f05 Make fDisconnect an std::atomic (Matt Corallo)
0544cc6 net: fix a few cases where messages were sent rather than dropped upon disconnection (furszy)

Pull request description:

  Running 4.2 branch with `enable-debug`.
  Fixing:
  ```
  POTENTIAL DEADLOCK DETECTED
  Previous lock order was:
  pnode->cs_vRecvMsg net.cpp:1830 (TRY) (in thread )
  (1) cs_main main.cpp:5275 (in thread )
  (2) cs_vSend net.cpp:2465 (in thread )
  Current lock order is:
  (2) pnode->cs_vSend net.cpp:1846 (TRY) (in thread )
  (1) cs_main main.cpp:6018 (TRY) (in thread )
  ```

  Plus, added few more minor possible races fixes too.

  Back porting:
  bitcoin#8862
  bitcoin#9225
  bitcoin#9535

ACKs for top commit:
  Fuzzbawls:
    ACK b954796
  random-zebra:
    ACK b954796

Tree-SHA512: e15bd81e51282771ca4cb40689229d54787d72d7ed0ebc5f6f0058d3e252cc6691d8ebfc4670207bcd835d32b7898863651fd2ac74704d6cb4fdd30554d8a83a
Warchant pushed a commit to Warchant/bitcoin that referenced this pull request Aug 6, 2020
This was changed to TRY_LOCK in bitcoin#1117 to fix a potential deadlock
between cs_main and cs_vSend. cs_vSend was split into cs_vSend and
cs_sendProcessing in bitcoin#9535 (and cs_sendProcessing was changed from a
TRY_LOCK to a LOCK in the same PR).

Since cs_vSend can no longer be taken before cs_main, revert this to a
LOCK().

This commit leaves part of the code with bad indentation. That is fixed
by the next (whitespace change only) commit.
@str4d str4d mentioned this pull request Apr 12, 2021
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants