Fix potential network stalling bug #27981

sipa · 2023-06-26T20:37:36Z

See ElementsProject/elements#1233. There, it has been observed that if both sides of a P2P connection have a significant amount of data to send, a stall can occur, where both try to drain their own send queue before trying to receive. The same issue seems to apply to the current Bitcoin Core codebase, though I don't know whether it's a frequent issue for us.

The core issue is that whenever our optimistic send fails to fully send a message, we do subsequently not even select() for receiving; if it then turns out that sending is not possible either, no progress is made at all. To address this, the solution used in this PR is to still select() for both sending and receiving when an optimistic send fails, but skip receiving if sending succeeded, and (still) doesn't fully drain the send queue.

This is a significant reduction in how aggressive the "receive pushback" mechanism is, because now it will only mildly push back while sending progress is made; if the other side stops receiving entirely, the pushback disappears. I don't think that's a serious problem though:

We still have a pushback mechanism at the application buffer level (when the application receive buffer overflows, receiving is paused until messages in the buffer get processed; waiting on our own net_processing thread, not on the remote party).
There are cases where the existing mechanism is too aggressive; e.g. when the send queue is non-empty, but tiny, and can be sent with a single send() call. In that case, I think we'd prefer to still receive within the same processing loop of the network thread.

DrahtBot · 2023-06-26T20:37:40Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	mzumsande, ajtowns, naumenkogs

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#28222 (Use shared_ptr for CNode inside CConnman by willcl-ark)
#28196 (BIP324 connection support by sipa)
#28165 (net: transport abstraction by sipa)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

sipa · 2023-06-26T20:38:02Z

cc @psgreco, who pointed to the issue, and helped test it.

psgreco · 2023-06-26T21:03:23Z

concept ack 5e92378, this potential issue is not easy to trigger on demand in bitcoin, but it's relatively easy to trigger in elements, when the node is roughly 20/24 hours behind. Tested in elements a similar version of the patch it does solve the stalling

kristapsk · 2023-06-27T08:38:26Z

Wouldn't it be possible to trigger and test this with some functional test?

psgreco · 2023-06-27T11:04:02Z

In theory, it should be, but in our tests (mostly @lolhill 's) a good component of this situation is latency. I've never been able to replicate this between 2 local hosts, always with a host that's ~100ms away.

ajtowns

The core issue is that whenever our optimistic send fails to fully send a message, we do subsequently not even select() for receiving; if it then turns out that sending is not possible either, no progress is made at all. To address this, the solution used in this PR is to still select() for both sending and receiving when an optimistic send fails, but skip receiving if sending succeeded, and (still) doesn't fully drain the send queue.

AIUI (correct me if I'm wrong!) the backpressure we do is:

fPauseSend -- we won't do ProcessMessage (which would probably cause us to generate new data to send) when this is set, which is when we have more than 1MB of data lined up. (However we will do SendMessages, which generates relatively small messages like INVs and PING and GETDATA)
fPauseRecv -- we won't select the socket for reading any more once we've got more than 5MB in parsed messages queued up to process
prefer sending over receiving -- if we've got data to send, we'll prioritise sending it, even if we're making no forward progress and could receive messages, to the point where we don't even check (via select/poll) to see if there's any data to receive when we've got data to send

This patches changes the latter logic to be:

prefer sending over receiving -- always see if we can send/receive data, but don't actually try to receive data until either (a) we don't have data to send in the first place, (b) we can't send any data, or (c) we've successfully sent all the data we have.

This seems pretty sensible to me: this is a peer to peer protocol, so it seems to me we should be making progress in parallel on sending and receiving whenever possible -- your send is my receive after all.

Approach ACK.

ajtowns · 2023-07-04T15:36:37Z

src/net.cpp

-            LOCK(pnode->cs_vSend);
-            select_send = !pnode->vSendMsg.empty();
-        }
+        bool select_send = WITH_LOCK(pnode->cs_vSend, return !pnode->vSendMsg.empty());


Would it make sense to introduce a method bool CNode::WantsToSend() const { return !pnode->vSendMsg.empty(); } method, and use that here and instead of returning a pair<X, bool> above?

That'd mean grabbing the lock twice, no? I added it to SocketSendData because the cs_vSend lock is already grabbed to call that.

Not if you make it WantsToSend() EXCLUSIVE_LOCK_REQUIRED(cs_vSend) and require the caller to have the lock?

For your consideration: ajtowns@87c509c . I expanded the WITH_LOCK out in GenerateWaitSockets because I thought that was clearer than trying to make it an expression. Keeps the signature of SocketSendData the same, doesn't add any additional locking, and avoids the dummy data_left in PushMessage.

naumenkogs · 2023-07-11T11:53:08Z

Approach ACK

src/net.cpp

Co-authored-by: Anthony Towns <aj@erisian.com.au>

sipa · 2023-07-20T14:53:42Z

@psgreco See above; it turned out that what I intended to do here wasn't actually what was implemented (it was instead unconditionally preferring send over receive). Would you mind trying again if this fixes the issue for you?

mzumsande · 2023-07-21T01:28:10Z

I wrote a functional test, see https://github.com/mzumsande/bitcoin/tree/test_sipa_netstalling (because of the 1st commit obviously not intended for merge, but it makes it possible to reproduce the problem).
It works by mining a few large blocks, and having two nodes exchange these blocks in both directions by repeated getblockfrompeer calls, and then check whether the deadlock happened.

Unfortunately, the current branch doesn't appear to fix the problem completely, the test fails for me both here and on master:
When the situation is reached where we now select for both sending and receiving (because our peer doesn't receive any data), we try to resolve the deadlock by now also receiving.
This works for a little while - however, if our send buffer is full, fPauseSend will be set, and because of that we skip early in ProcessMessages() and don't call PollMessage() anymore. Therefore the received data will pile up without being cleared by net_processing. When pnode->m_msg_process_queue_size becomes too large (5MB), fPauseRecv will also be set, and after that we are again in a deadlock situation where both peers are sending and none is receiving. I could observe this with the added logging in the 3rd commit in my branch.

Not sure how to best fix this...

ajtowns · 2023-07-21T04:50:04Z

I think fPauseSend getting set on both sides and causing a deadlock should probably be out of scope for this PR -- at least as I understand it, this fixes an issue where we get a deadlock even without fPauseSend triggering.

I think the scenario here is:

peer A sends a 2MB message to peer B. This fills up B's socket receive buffer (200kB?) and A's socket send buffer (200kB?) without completing. A still has 1.6MB to send to B, so stops reading from the socket.
peer B does the same thing at almost exactly the same time, with the same result.
A/B are deadlocked.

Maybe adding a debug-only sendp2pmsg rpc would be the easiest way to simulate this and be useful for debugging p2p things in general?

If we do want to address fPauseSend deadlocking, a few approaches come to mind:

easy: make fPauseSend a timestamp rather than a boolean, and if it's been set for >5 minutes, disconnect. doesn't prevent the deadlock, but at least frees up the connection slot and makes it possible to try again.
hard: rework net_processing so that we keep making as much progress as we can -- eg, change fPauseSend to continue processing incoming block or tx messages, but to skip GETDATA messages and to defer sending out INV messages and the like, so that we're draining as much traffic as we can, while limiting how much gets added.
impossible? add more dynamic traffic throttling: if you're bandwidth limited and getting too much TX traffic, perhaps you should be raising your feefilter level even if your mempool isn't full? I don't see how to generalise that if it's blocks or header messages or something else that cause a problem though.

psgreco · 2023-07-21T20:05:16Z

@psgreco See above; it turned out that what I intended to do here wasn't actually what was implemented (it was instead unconditionally preferring send over receive). Would you mind trying again if this fixes the issue for you?

It seems to fix the issue for me still with the new changes, but the refactor that I had to do to run in elements 22 (like bitcoin 22), doesn't let me make a hard confirmation.

sipa · 2023-07-24T14:03:03Z

@ajtowns @mzumsande Thanks, so it appears there are actually two mostly-unrelated network buffer deadlock issues, and Martin's test is likely triggering both of them.

I agree with AJ that we should still fix the network side one, even if we can't (or don't want to address) the application buffer side one. Fixing the application buffer side one indeed seems a lot harder, and probably needs discussion beyond this PR.

It would be good to have a test for the network side one, without it also triggering the application side one, to verify this PR actually fixes something. Especially as I don't understand @psgreco's earlier observation (where an older version of this PR unconditionally preferred sending, which shouldn't improve the situation at all) as a means to validate it.

mzumsande · 2023-08-11T11:37:33Z

Thanks, so it appears there are actually two mostly-unrelated network buffer deadlock issues, and Martin's test is likely triggering both of them.

So far, I haven't been able to trigger the original deadlock issue in my test when I run it on master - only the other one described above.

mzumsande

Tested ACK 3388e52

I now managed to reproduce the deadlock described in the OP by

1.) adding a debug-only sendmsgtopeer rpc as suggested by @ajtowns above (mzumsande@9c90e5d)

2.) Creating a functional test that uses this rpc to simultaneously have two nodes send a large message (4MB) to each other. (mzumsande@70e6752)

The added test creates the situation described above and fails on master and succeeds on this branch.

If you want, feel free to include the commits from 202208_test_sendmsg - alternatively, if you'd prefer not to deal with test / rpc feedback here, I'd also be happy to open a separate PR that builds on this branch.

ajtowns · 2023-08-17T09:38:23Z

ACK 3388e52

Test case looks good; seems reasonable to do the sendmsgtopeer change in a separate PR though.

naumenkogs · 2023-08-17T11:50:07Z

ACK 3388e52

fanquake · 2023-08-17T12:15:26Z

We can follow up with the suggestions / changes:

If you want, feel free to include the commits from 202208_test_sendmsg - alternatively, if you'd prefer not to deal with test / rpc feedback here, I'd also be happy to open a separate PR that builds on this branch.

For your consideration: https://github.com/ajtowns/bitcoin/commit/87c509c6d6ee0d355d08e0d4bc60bc01d4a0ad60 . I expanded the WITH_LOCK out

Sjors · 2023-08-18T15:41:59Z

Post merge concept ACK. From my (very) limited understanding of sockets, this makes sense. Thanks @ajtowns for the description #27981 (comment).

…evel deadlock situation b3a93b4 test: add functional test for deadlock situation (Martin Zumsande) 3557aa4 test: add basic tests for sendmsgtopeer to rpc_net.py (Martin Zumsande) a9a1d69 rpc: add test-only sendmsgtopeer rpc (Martin Zumsande) Pull request description: This adds a `sendmsgtopeer` rpc (for testing only) that allows a node to send a message (provided in hex) to a peer. While we would usually use a `p2p` object instead of a node for this in the test framework, that isn't possible in situations where this message needs to trigger an actual interaction of multiple nodes. Use this rpc to add test coverage for the bug fixed in #27981 (that just got merged): The test lets two nodes (almost) simultaneously send a single large (4MB) p2p message to each other, which would have caused a deadlock previously (making this test fail), but succeeds now. As can be seen from the discussion in #27981, it was not easy to reproduce this bug without `sendmsgtopeer`. I would imagine that `sendmsgtopeer` could also be helpful in various other test constellations. ACKs for top commit: ajtowns: ACK b3a93b4 sipa: ACK b3a93b4 achow101: ACK b3a93b4 Tree-SHA512: 6e22e72402f3c4dd70cddb9e96ea988444720f7a164031df159fbdd48056c8ac77ac53def045d9208a3ca07437c7c8e34f8b4ebc7066c0a84d81cd53f2f4fa5f

Co-authored-by: Anthony Towns <aj@erisian.com.au> Github-Pull: bitcoin#27981 Rebased-From: 3388e52

…r net-level deadlock situation b3a93b4 test: add functional test for deadlock situation (Martin Zumsande) 3557aa4 test: add basic tests for sendmsgtopeer to rpc_net.py (Martin Zumsande) a9a1d69 rpc: add test-only sendmsgtopeer rpc (Martin Zumsande) Pull request description: This adds a `sendmsgtopeer` rpc (for testing only) that allows a node to send a message (provided in hex) to a peer. While we would usually use a `p2p` object instead of a node for this in the test framework, that isn't possible in situations where this message needs to trigger an actual interaction of multiple nodes. Use this rpc to add test coverage for the bug fixed in bitcoin#27981 (that just got merged): The test lets two nodes (almost) simultaneously send a single large (4MB) p2p message to each other, which would have caused a deadlock previously (making this test fail), but succeeds now. As can be seen from the discussion in bitcoin#27981, it was not easy to reproduce this bug without `sendmsgtopeer`. I would imagine that `sendmsgtopeer` could also be helpful in various other test constellations. ACKs for top commit: ajtowns: ACK b3a93b4 sipa: ACK b3a93b4 achow101: ACK b3a93b4 Tree-SHA512: 6e22e72402f3c4dd70cddb9e96ea988444720f7a164031df159fbdd48056c8ac77ac53def045d9208a3ca07437c7c8e34f8b4ebc7066c0a84d81cd53f2f4fa5f

marking as partial as it should be revisited when bitcoin#24356 is backported

Also make `v`{`Receivable`, `Sendable`, `Error`}`Nodes` `std::set`s so that bitcoin#27981 can remove a node from `vReceivableNodes` if there is data left to send (we already do this by checking against `vSendMsg` through `nSendMsgSize` but this doesn't account for leftover data reported by `SocketSendData`, which the backport does).

Marking as partial as it should be revisited when bitcoin#24356 is backported

Also make `v`{`Receivable`, `Sendable`, `Error`}`Nodes` `std::set`s so that bitcoin#27981 can remove a node from `vReceivableNodes` if there is data left to send (we already do this by checking against `vSendMsg` through `nSendMsgSize` but this doesn't account for leftover data reported by `SocketSendData`, which the backport does).

Marking as partial as it should be revisited when bitcoin#24356 is backported

Also make `v`{`Receivable`, `Sendable`, `Error`}`Nodes` `std::set`s so that bitcoin#27981 can remove a node from `vReceivableNodes` if there is data left to send (we already do this by checking against `vSendMsg` through `nSendMsgSize` but this doesn't account for leftover data reported by `SocketSendData`, which the backport does).

Marking as partial as it should be revisited when bitcoin#24356 is backported

DrahtBot mentioned this pull request Jun 27, 2023

net, refactor: Privatise CNode send queue #27407

Closed

glozow added the P2P label Jun 29, 2023

ajtowns reviewed Jul 4, 2023

View reviewed changes

mzumsande reviewed Jul 19, 2023

View reviewed changes

src/net.cpp Outdated Show resolved Hide resolved

Rework receive buffer pushback

3388e52

Co-authored-by: Anthony Towns <aj@erisian.com.au>

sipa force-pushed the 202306_pushback branch from 5e92378 to 3388e52 Compare July 20, 2023 14:36

This was referenced Jul 27, 2023

net: transport abstraction #28165

Merged

BIP324 connection support #28196

Merged

DrahtBot mentioned this pull request Aug 5, 2023

Use shared_ptr for CNode inside CConnman #28222

Closed

mzumsande reviewed Aug 15, 2023

View reviewed changes

fanquake merged commit 0a55bcd into bitcoin:master Aug 17, 2023

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Aug 17, 2023

Merge bitcoin#27981: Fix potential network stalling bug

9e088b0

mzumsande mentioned this pull request Aug 17, 2023

rpc, test: add sendmsgtopeer rpc and a test for net-level deadlock situation #28287

Merged

luke-jr pushed a commit to luke-jr/bitcoin that referenced this pull request Aug 29, 2023

Rework receive buffer pushback

cd5476a

Co-authored-by: Anthony Towns <aj@erisian.com.au> Github-Pull: bitcoin#27981 Rebased-From: 3388e52

kwvg added a commit to kwvg/dash that referenced this pull request Aug 4, 2024

partial bitcoin#27981: Fix potential network stalling bug

b86fe80

marking as partial as it should be revisited when bitcoin#24356 is backported

kwvg added a commit to kwvg/dash that referenced this pull request Aug 4, 2024

partial bitcoin#27981: Fix potential network stalling bug

8eceb7c

marking as partial as it should be revisited when bitcoin#24356 is backported

kwvg added a commit to kwvg/dash that referenced this pull request Aug 6, 2024

partial bitcoin#27981: Fix potential network stalling bug

7e91aa3

Marking as partial as it should be revisited when bitcoin#24356 is backported

kwvg added a commit to kwvg/dash that referenced this pull request Aug 6, 2024

partial bitcoin#27981: Fix potential network stalling bug

ee7affc

Marking as partial as it should be revisited when bitcoin#24356 is backported

kwvg added a commit to kwvg/dash that referenced this pull request Aug 13, 2024

partial bitcoin#27981: Fix potential network stalling bug

5d534d2

Marking as partial as it should be revisited when bitcoin#24356 is backported

bitcoin locked and limited conversation to collaborators Aug 17, 2024

Fix potential network stalling bug #27981

Fix potential network stalling bug #27981

Uh oh!

Conversation

sipa commented Jun 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Jun 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews

Conflicts

Uh oh!

sipa commented Jun 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psgreco commented Jun 26, 2023

Uh oh!

kristapsk commented Jun 27, 2023

Uh oh!

psgreco commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajtowns left a comment

Choose a reason for hiding this comment

Uh oh!

ajtowns Jul 4, 2023

Choose a reason for hiding this comment

Uh oh!

sipa Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajtowns Jul 21, 2023

Choose a reason for hiding this comment

Uh oh!

ajtowns Aug 17, 2023

Choose a reason for hiding this comment

Uh oh!

naumenkogs commented Jul 11, 2023

Uh oh!

Uh oh!

sipa commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mzumsande commented Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ajtowns commented Jul 21, 2023

Uh oh!

psgreco commented Jul 21, 2023

Uh oh!

sipa commented Jul 24, 2023

Uh oh!

mzumsande commented Aug 11, 2023

Uh oh!

mzumsande left a comment

Choose a reason for hiding this comment

Uh oh!

ajtowns commented Aug 17, 2023

Uh oh!

naumenkogs commented Aug 17, 2023

Uh oh!

fanquake commented Aug 17, 2023

Uh oh!

Sjors commented Aug 18, 2023

Uh oh!

Uh oh!

sipa commented Jun 26, 2023 •

edited

Loading

DrahtBot commented Jun 26, 2023 •

edited

Loading

sipa commented Jun 26, 2023 •

edited

Loading

psgreco commented Jun 27, 2023 •

edited

Loading

sipa Jul 20, 2023 •

edited

Loading

sipa commented Jul 20, 2023 •

edited

Loading

mzumsande commented Jul 21, 2023 •

edited

Loading