-
Notifications
You must be signed in to change notification settings - Fork 37.7k
Fix potential network stalling bug #27981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
cc @psgreco, who pointed to the issue, and helped test it. |
concept ack 5e92378, this potential issue is not easy to trigger on demand in bitcoin, but it's relatively easy to trigger in elements, when the node is roughly 20/24 hours behind. Tested in elements a similar version of the patch it does solve the stalling |
Wouldn't it be possible to trigger and test this with some functional test? |
In theory, it should be, but in our tests (mostly @lolhill 's) a good component of this situation is latency. I've never been able to replicate this between 2 local hosts, always with a host that's ~100ms away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core issue is that whenever our optimistic send fails to fully send a message, we do subsequently not even select() for receiving; if it then turns out that sending is not possible either, no progress is made at all. To address this, the solution used in this PR is to still select() for both sending and receiving when an optimistic send fails, but skip receiving if sending succeeded, and (still) doesn't fully drain the send queue.
AIUI (correct me if I'm wrong!) the backpressure we do is:
fPauseSend
-- we won't doProcessMessage
(which would probably cause us to generate new data to send) when this is set, which is when we have more than 1MB of data lined up. (However we will doSendMessages
, which generates relatively small messages like INVs and PING and GETDATA)fPauseRecv
-- we won't select the socket for reading any more once we've got more than 5MB in parsed messages queued up to process- prefer sending over receiving -- if we've got data to send, we'll prioritise sending it, even if we're making no forward progress and could receive messages, to the point where we don't even check (via select/poll) to see if there's any data to receive when we've got data to send
This patches changes the latter logic to be:
- prefer sending over receiving -- always see if we can send/receive data, but don't actually try to receive data until either (a) we don't have data to send in the first place, (b) we can't send any data, or (c) we've successfully sent all the data we have.
This seems pretty sensible to me: this is a peer to peer protocol, so it seems to me we should be making progress in parallel on sending and receiving whenever possible -- your send is my receive after all.
Approach ACK.
LOCK(pnode->cs_vSend); | ||
select_send = !pnode->vSendMsg.empty(); | ||
} | ||
bool select_send = WITH_LOCK(pnode->cs_vSend, return !pnode->vSendMsg.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to introduce a method bool CNode::WantsToSend() const { return !pnode->vSendMsg.empty(); }
method, and use that here and instead of returning a pair<X, bool>
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd mean grabbing the lock twice, no? I added it to SocketSendData
because the cs_vSend
lock is already grabbed to call that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not if you make it WantsToSend() EXCLUSIVE_LOCK_REQUIRED(cs_vSend)
and require the caller to have the lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For your consideration: ajtowns@87c509c . I expanded the WITH_LOCK
out in GenerateWaitSockets
because I thought that was clearer than trying to make it an expression. Keeps the signature of SocketSendData
the same, doesn't add any additional locking, and avoids the dummy data_left
in PushMessage
.
Approach ACK |
Co-authored-by: Anthony Towns <aj@erisian.com.au>
@psgreco See above; it turned out that what I intended to do here wasn't actually what was implemented (it was instead unconditionally preferring send over receive). Would you mind trying again if this fixes the issue for you? |
I wrote a functional test, see https://github.com/mzumsande/bitcoin/tree/test_sipa_netstalling (because of the 1st commit obviously not intended for merge, but it makes it possible to reproduce the problem). Unfortunately, the current branch doesn't appear to fix the problem completely, the test fails for me both here and on master: Not sure how to best fix this... |
I think I think the scenario here is:
Maybe adding a debug-only sendp2pmsg rpc would be the easiest way to simulate this and be useful for debugging p2p things in general? If we do want to address
|
It seems to fix the issue for me still with the new changes, but the refactor that I had to do to run in elements 22 (like bitcoin 22), doesn't let me make a hard confirmation. |
@ajtowns @mzumsande Thanks, so it appears there are actually two mostly-unrelated network buffer deadlock issues, and Martin's test is likely triggering both of them. I agree with AJ that we should still fix the network side one, even if we can't (or don't want to address) the application buffer side one. Fixing the application buffer side one indeed seems a lot harder, and probably needs discussion beyond this PR. It would be good to have a test for the network side one, without it also triggering the application side one, to verify this PR actually fixes something. Especially as I don't understand @psgreco's earlier observation (where an older version of this PR unconditionally preferred sending, which shouldn't improve the situation at all) as a means to validate it. |
So far, I haven't been able to trigger the original deadlock issue in my test when I run it on master - only the other one described above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested ACK 3388e52
I now managed to reproduce the deadlock described in the OP by
1.) adding a debug-only sendmsgtopeer
rpc as suggested by @ajtowns above (mzumsande@9c90e5d)
2.) Creating a functional test that uses this rpc to simultaneously have two nodes send a large message (4MB) to each other. (mzumsande@70e6752)
The added test creates the situation described above and fails on master and succeeds on this branch.
If you want, feel free to include the commits from 202208_test_sendmsg - alternatively, if you'd prefer not to deal with test / rpc feedback here, I'd also be happy to open a separate PR that builds on this branch.
ACK 3388e52 Test case looks good; seems reasonable to do the |
ACK 3388e52 |
We can follow up with the suggestions / changes: |
Post merge concept ACK. From my (very) limited understanding of sockets, this makes sense. Thanks @ajtowns for the description #27981 (comment). |
…evel deadlock situation b3a93b4 test: add functional test for deadlock situation (Martin Zumsande) 3557aa4 test: add basic tests for sendmsgtopeer to rpc_net.py (Martin Zumsande) a9a1d69 rpc: add test-only sendmsgtopeer rpc (Martin Zumsande) Pull request description: This adds a `sendmsgtopeer` rpc (for testing only) that allows a node to send a message (provided in hex) to a peer. While we would usually use a `p2p` object instead of a node for this in the test framework, that isn't possible in situations where this message needs to trigger an actual interaction of multiple nodes. Use this rpc to add test coverage for the bug fixed in #27981 (that just got merged): The test lets two nodes (almost) simultaneously send a single large (4MB) p2p message to each other, which would have caused a deadlock previously (making this test fail), but succeeds now. As can be seen from the discussion in #27981, it was not easy to reproduce this bug without `sendmsgtopeer`. I would imagine that `sendmsgtopeer` could also be helpful in various other test constellations. ACKs for top commit: ajtowns: ACK b3a93b4 sipa: ACK b3a93b4 achow101: ACK b3a93b4 Tree-SHA512: 6e22e72402f3c4dd70cddb9e96ea988444720f7a164031df159fbdd48056c8ac77ac53def045d9208a3ca07437c7c8e34f8b4ebc7066c0a84d81cd53f2f4fa5f
Co-authored-by: Anthony Towns <aj@erisian.com.au> Github-Pull: bitcoin#27981 Rebased-From: 3388e52
…r net-level deadlock situation b3a93b4 test: add functional test for deadlock situation (Martin Zumsande) 3557aa4 test: add basic tests for sendmsgtopeer to rpc_net.py (Martin Zumsande) a9a1d69 rpc: add test-only sendmsgtopeer rpc (Martin Zumsande) Pull request description: This adds a `sendmsgtopeer` rpc (for testing only) that allows a node to send a message (provided in hex) to a peer. While we would usually use a `p2p` object instead of a node for this in the test framework, that isn't possible in situations where this message needs to trigger an actual interaction of multiple nodes. Use this rpc to add test coverage for the bug fixed in bitcoin#27981 (that just got merged): The test lets two nodes (almost) simultaneously send a single large (4MB) p2p message to each other, which would have caused a deadlock previously (making this test fail), but succeeds now. As can be seen from the discussion in bitcoin#27981, it was not easy to reproduce this bug without `sendmsgtopeer`. I would imagine that `sendmsgtopeer` could also be helpful in various other test constellations. ACKs for top commit: ajtowns: ACK b3a93b4 sipa: ACK b3a93b4 achow101: ACK b3a93b4 Tree-SHA512: 6e22e72402f3c4dd70cddb9e96ea988444720f7a164031df159fbdd48056c8ac77ac53def045d9208a3ca07437c7c8e34f8b4ebc7066c0a84d81cd53f2f4fa5f
marking as partial as it should be revisited when bitcoin#24356 is backported
marking as partial as it should be revisited when bitcoin#24356 is backported
Also make `v`{`Receivable`, `Sendable`, `Error`}`Nodes` `std::set`s so that bitcoin#27981 can remove a node from `vReceivableNodes` if there is data left to send (we already do this by checking against `vSendMsg` through `nSendMsgSize` but this doesn't account for leftover data reported by `SocketSendData`, which the backport does).
Marking as partial as it should be revisited when bitcoin#24356 is backported
Also make `v`{`Receivable`, `Sendable`, `Error`}`Nodes` `std::set`s so that bitcoin#27981 can remove a node from `vReceivableNodes` if there is data left to send (we already do this by checking against `vSendMsg` through `nSendMsgSize` but this doesn't account for leftover data reported by `SocketSendData`, which the backport does).
Marking as partial as it should be revisited when bitcoin#24356 is backported
Also make `v`{`Receivable`, `Sendable`, `Error`}`Nodes` `std::set`s so that bitcoin#27981 can remove a node from `vReceivableNodes` if there is data left to send (we already do this by checking against `vSendMsg` through `nSendMsgSize` but this doesn't account for leftover data reported by `SocketSendData`, which the backport does).
Marking as partial as it should be revisited when bitcoin#24356 is backported
See ElementsProject/elements#1233. There, it has been observed that if both sides of a P2P connection have a significant amount of data to send, a stall can occur, where both try to drain their own send queue before trying to receive. The same issue seems to apply to the current Bitcoin Core codebase, though I don't know whether it's a frequent issue for us.
The core issue is that whenever our optimistic send fails to fully send a message, we do subsequently not even select() for receiving; if it then turns out that sending is not possible either, no progress is made at all. To address this, the solution used in this PR is to still select() for both sending and receiving when an optimistic send fails, but skip receiving if sending succeeded, and (still) doesn't fully drain the send queue.
This is a significant reduction in how aggressive the "receive pushback" mechanism is, because now it will only mildly push back while sending progress is made; if the other side stops receiving entirely, the pushback disappears. I don't think that's a serious problem though: