Skip to content

Conversation

mzumsande
Copy link
Contributor

@mzumsande mzumsande commented Aug 17, 2023

This adds a sendmsgtopeer rpc (for testing only) that allows a node to send a message (provided in hex) to a peer.
While we would usually use a p2p object instead of a node for this in the test framework, that isn't possible in situations where this message needs to trigger an actual interaction of multiple nodes.

Use this rpc to add test coverage for the bug fixed in #27981 (that just got merged):
The test lets two nodes (almost) simultaneously send a single large (4MB) p2p message to each other, which would have caused a deadlock previously (making this test fail), but succeeds now.

As can be seen from the discussion in #27981, it was not easy to reproduce this bug without sendmsgtopeer. I would imagine that sendmsgtopeer could also be helpful in various other test constellations.

@DrahtBot
Copy link
Contributor

DrahtBot commented Aug 17, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK ajtowns, sipa, achow101
Concept ACK russeree
Stale ACK brunoerg

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #27511 (rpc: Add test-only RPC getaddrmaninfo for new/tried table address count by stratospher)
  • #26988 (cli: rework -addrinfo cli to use addresses which aren’t filtered for quality/recency by stratospher)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@brunoerg
Copy link
Contributor

Concept ACK

@ajtowns
Copy link
Contributor

ajtowns commented Aug 18, 2023

Concept ACK. Probably should add some basic functional tests to rpc_net.py with a P2PInterface checking that it works -- what happens if msg_type is 0 bytes, 13 bytes (> protocol.h:COMMAND_SIZE); what if msg is empty or more than 4MB (net.h:MAX_PROTOCOL_MESSAGE_LENGTH) or 32MB (serialize.h:MAX_SIZE)?

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Feel free to ignore the nits.

@maflcko
Copy link
Member

maflcko commented Aug 18, 2023

Error: RPC command "sendmsgtopeer" not found in RPC_COMMANDS_SAFE_FOR_FUZZING or RPC_COMMANDS_NOT_SAFE_FOR_FUZZING. Please update test/fuzz/rpc.cpp.

@portlandhodl
Copy link
Contributor

Concept ACK this would be useful.

@mzumsande mzumsande force-pushed the 202208_test_sendmsg branch 2 times, most recently from 61fc011 to d1404b3 Compare August 21, 2023 20:02
@mzumsande
Copy link
Contributor Author

mzumsande commented Aug 21, 2023

Thanks for the reviews! I pushed an update with changes due to feedback, haven't addressed @ajtowns suggestion yet:

Concept ACK. Probably should add some basic functional tests to rpc_net.py with a P2PInterface checking that it works -- what happens if msg_type is 0 bytes, 13 bytes (> protocol.h:COMMAND_SIZE); what if msg is empty or more than 4MB (net.h:MAX_PROTOCOL_MESSAGE_LENGTH) or 32MB (serialize.h:MAX_SIZE)?

Will add this to rpc_net.py soon. I think the best approach might be to allow these kinds of things things (e.g. empty msg_type, message body > 4MB) even if they don't make much sense - as long as they can be dealt with locally and don't trigger any asserts, like the oversized msg_type does.

Error: RPC command "sendmsgtopeer" not found in RPC_COMMANDS_SAFE_FOR_FUZZING or RPC_COMMANDS_NOT_SAFE_FOR_FUZZING. Please update test/fuzz/rpc.cpp.

updated with the same comment getblockfrompeer has, indicating that no msg will be sent when there are no peers.

Copy link
Contributor

@brunoerg brunoerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK d1404b3

This rpc can be used when we want a node to send a message, but
cannot use a python P2P object, for example for testing of low-level
net transport behavior.
@mzumsande mzumsande force-pushed the 202208_test_sendmsg branch from d1404b3 to f25b13a Compare August 22, 2023 17:30
@mzumsande
Copy link
Contributor Author

I added a commit for basic coverage in rpc_net.py and made some other minor changes.

what if msg is empty or more than 4MB (net.h:MAX_PROTOCOL_MESSAGE_LENGTH) or 32MB (serialize.h:MAX_SIZE)?

I added coverage for >4MB. I couldn't test the case >32MB, because already at ~16MB I get a failure passing a string that long via RPC (independent on the actual RPC, so this can't be handled within sendmsgtopeer).

@mzumsande mzumsande force-pushed the 202208_test_sendmsg branch from f25b13a to b3a93b4 Compare August 22, 2023 17:46
@ajtowns
Copy link
Contributor

ajtowns commented Aug 23, 2023

ACK b3a93b4

LGTM. bitcoin-cli sendmsgtopeer 0 mempool "" to a peer that has you whitelisted works as a poor man's importmempool (#27460) which is neat. Confirm that p2p_net_deadlock test catches the bug if #27981 is reverted (and running netstat -n confirms the tcp buffers are just sitting there being ignored while the test case is running).

@sipa
Copy link
Member

sipa commented Aug 24, 2023

ACK b3a93b4

Two non-blocking comments:

  • Maybe add a comment to the p2p_net_deadlock.py that explains what issue it is supposed to address (perhaps by linking to the PR that fixed it).
  • The RPC added does not check whether the msg_type consists of valid characters (see CMessageHeader::IsCommandValid()). Currently, such messages are ignored by the receiver. Does it make sense to add a check the RPC, or a test that sending such messages works?

@achow101
Copy link
Member

ACK b3a93b4

@achow101 achow101 merged commit c9273f6 into bitcoin:master Aug 24, 2023
Frank-GER pushed a commit to syscoin/syscoin that referenced this pull request Sep 8, 2023
…r net-level deadlock situation

b3a93b4 test: add functional test for deadlock situation (Martin Zumsande)
3557aa4 test: add basic tests for sendmsgtopeer to rpc_net.py (Martin Zumsande)
a9a1d69 rpc: add test-only sendmsgtopeer rpc (Martin Zumsande)

Pull request description:

  This adds a `sendmsgtopeer` rpc (for testing only) that allows a node to send a message (provided in hex) to a peer.
  While we would usually use a `p2p` object instead of a node for this in the test framework, that isn't possible in situations where this message needs to trigger an actual interaction of multiple nodes.

  Use this rpc to add test coverage for the bug fixed in bitcoin#27981 (that just got merged):
  The test lets two nodes (almost) simultaneously send a single large (4MB) p2p message to each other, which would have caused a deadlock previously (making this test fail), but succeeds now.

  As can be seen from the discussion in bitcoin#27981, it was not easy to reproduce this bug without `sendmsgtopeer`. I would imagine that `sendmsgtopeer` could also be helpful in various other test constellations.

ACKs for top commit:
  ajtowns:
    ACK b3a93b4
  sipa:
    ACK b3a93b4
  achow101:
    ACK b3a93b4

Tree-SHA512: 6e22e72402f3c4dd70cddb9e96ea988444720f7a164031df159fbdd48056c8ac77ac53def045d9208a3ca07437c7c8e34f8b4ebc7066c0a84d81cd53f2f4fa5f
@mzumsande mzumsande deleted the 202208_test_sendmsg branch November 7, 2023 20:31
kwvg added a commit to kwvg/dash that referenced this pull request Sep 17, 2024
… deadlock situation

`random_bytes()` is introduced in bitcoin#25625 but the function def
alone doesn't warrant a full backport, so we'll only implement the
section relevant to this PR.
kwvg added a commit to kwvg/dash that referenced this pull request Sep 17, 2024
… deadlock situation

`random_bytes()` is introduced in bitcoin#25625 but the function def
alone doesn't warrant a full backport, so we'll only implement the
section relevant to this PR.
kwvg added a commit to kwvg/dash that referenced this pull request Sep 22, 2024
… deadlock situation

`random_bytes()` is introduced in bitcoin#25625 but the function def
alone doesn't warrant a full backport, so we'll only implement the
section relevant to this PR.
kwvg added a commit to kwvg/dash that referenced this pull request Sep 24, 2024
… deadlock situation

`random_bytes()` is introduced in bitcoin#25625 but the function def
alone doesn't warrant a full backport, so we'll only implement the
section relevant to this PR.
kwvg added a commit to kwvg/dash that referenced this pull request Sep 26, 2024
… deadlock situation

`random_bytes()` is introduced in bitcoin#25625 but the function def
alone doesn't warrant a full backport, so we'll only implement the
section relevant to this PR.
kwvg added a commit to kwvg/dash that referenced this pull request Sep 30, 2024
The fix introduced in bitcoin#27981 (8c986d6, dash#6067) is not
validated to work until bitcoin#28287, an upcoming backport.

As the latter has identified the former backport didn't pass validation,
let's fix it so that the latter tests succeed when they're backported
in an upcoming commit.
kwvg added a commit to kwvg/dash that referenced this pull request Sep 30, 2024
… deadlock situation

`random_bytes()` is introduced in bitcoin#25625 but the function def
alone doesn't warrant a full backport, so we'll only implement the
section relevant to this PR.
kwvg added a commit to kwvg/dash that referenced this pull request Oct 1, 2024
The fix introduced in bitcoin#27981 (8c986d6, dash#6067) is not
validated to work until bitcoin#28287, an upcoming backport.

As the latter has identified the former backport didn't pass validation,
let's fix it so that the latter tests succeed when they're backported
in an upcoming commit.
kwvg added a commit to kwvg/dash that referenced this pull request Oct 1, 2024
… deadlock situation

`random_bytes()` is introduced in bitcoin#25625 but the function def
alone doesn't warrant a full backport, so we'll only implement the
section relevant to this PR.
kwvg added a commit to kwvg/dash that referenced this pull request Oct 2, 2024
The fix introduced in bitcoin#27981 (8c986d6, dash#6067) is not
validated to work until bitcoin#28287, an upcoming backport.

As the latter has identified the former backport didn't pass validation,
let's fix it so that the latter tests succeed when they're backported
in an upcoming commit.
kwvg added a commit to kwvg/dash that referenced this pull request Oct 2, 2024
… deadlock situation

`random_bytes()` is introduced in bitcoin#25625 but the function def
alone doesn't warrant a full backport, so we'll only implement the
section relevant to this PR.
kwvg added a commit to kwvg/dash that referenced this pull request Oct 2, 2024
… deadlock situation

`random_bytes()` is introduced in bitcoin#25625 but the function def
alone doesn't warrant a full backport, so we'll only implement the
section relevant to this PR.
PastaPastaPasta added a commit to dashpay/dash that referenced this pull request Oct 4, 2024
, bitcoin#23774, bitcoin#25443, bitcoin#26138, bitcoin#26854, bitcoin#27128, bitcoin#27761, bitcoin#27863, bitcoin#28287, bitcoin#30118, partial bitcoin#22778 (auxiliary backports: part 16)

e458adb merge bitcoin#30118: improve robustness of connect_nodes() (UdjinM6)
ac94de2 merge bitcoin#28287: add `sendmsgtopeer` rpc and a test for net-level deadlock situation (Kittywhiskers Van Gogh)
d1fce0b fix: ensure that deadlocks are actually resolved (Kittywhiskers Van Gogh)
19e7bf6 merge bitcoin#27863: do not break when addr is not from a distinct network group (Kittywhiskers Van Gogh)
1adb9a2 merge bitcoin#27761: Log addresses of stalling peers (Kittywhiskers Van Gogh)
2854a6a merge bitcoin#27128: fix intermittent issue in `p2p_disconnect_ban` (Kittywhiskers Van Gogh)
d4b0fae merge bitcoin#26854: Fix intermittent timeout in p2p_permissions.py (Kittywhiskers Van Gogh)
892e329 merge bitcoin#26138: Avoid race in disconnect_nodes helper (Kittywhiskers Van Gogh)
d6ce037 merge bitcoin#25443: Fail if connect_nodes fails (Kittywhiskers Van Gogh)
60b5392 partial bitcoin#22778: Reduce resource usage for inbound block-relay-only connections (Kittywhiskers Van Gogh)
85c4aef merge bitcoin#23774: Add missing assert_equal import to p2p_add_connections.py (Kittywhiskers Van Gogh)
0354417 merge bitcoin#22777: don't request tx relay on feeler connections (Kittywhiskers Van Gogh)
7229eb0 merge bitcoin#23042: Avoid logging AlreadyHaveTx when disconnecting misbehaving peer (Kittywhiskers Van Gogh)
05395ff merge bitcoin#22817: Avoid race after connect_nodes (Kittywhiskers Van Gogh)

Pull request description:

  ## Additional Information

  * Depends on #6286

  * Depends on #6287

  * Depends on #6289

  * When backporting [bitcoin#28287](bitcoin#28287), `p2p_net_deadlock.py` relies on the function, `random_bytes()`, that is introduced in [bitcoin#25625](bitcoin#25625). Backporting [bitcoin#25625](bitcoin#25625) would attract changes outside the scope of this PR.

    In the interest of brevity, the changes that introduce `random_bytes()` have been included in [bitcoin#28287](bitcoin#28287) instead.

  ## Breaking Changes

  None expected.

  ## Checklist:

  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas **(note: N/A)**
  - [x] I have added or updated relevant unit/integration/functional/e2e tests
  - [x] I have made corresponding changes to the documentation **(note: N/A)**
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  UdjinM6:
    utACK e458adb
  PastaPastaPasta:
    utACK e458adb

Tree-SHA512: 48494004dddecb31c53f5e19ab0114b92ed7b4381c7977800fd49b7403222badbfdcfe46241e854f5b086c6f54a35f6483f91c6f047b7ac9b1e88e35bb32ad02
Fabcien pushed a commit to Bitcoin-ABC/bitcoin-abc that referenced this pull request Nov 19, 2024
Summary:
It has been observed that if both sides of a P2P connection have a significant amount of data to send, a stall can occur, where both try to drain their own send queue before trying to receive. The same issue seems to apply to the current Bitcoin Core codebase, though I don't know whether it's a frequent issue for us.

The core issue is that whenever our optimistic send fails to fully send a message, we do subsequently not even select() for receiving; if it then turns out that sending is not possible either, no progress is made at all. To address this, the solution used in this PR is to still select() for both sending and receiving when an optimistic send fails, but skip receiving if sending succeeded, and (still) doesn't fully drain the send queue.

This is a significant reduction in how aggressive the "receive pushback" mechanism is, because now it will only mildly push back while sending progress is made; if the other side stops receiving entirely, the pushback disappears. I don't think that's a serious problem though:

 - We still have a pushback mechanism at the application buffer level (when the application receive buffer overflows, receiving is paused until messages in the buffer get processed; waiting on our own net_processing thread, not on the remote party).
 - There are cases where the existing mechanism is too aggressive; e.g. when the send queue is non-empty, but tiny, and can be sent with a single send() call. In that case, I think we'd prefer to still receive within the same processing loop of the network thread.

Co-authored-by: Anthony Towns <aj@erisian.com.au>

core#28287:
> rpc: add test-only sendmsgtopeer rpc
>
> This rpc can be used when we want a node to send a message, but
> cannot use a python P2P object, for example for testing of low-level
> net transport behavior.

> test: add basic tests for sendmsgtopeer to rpc_net.py

> test: add functional test for deadlock situation

This is a backport of [[bitcoin/bitcoin#27981 | core#27981]] and  [[bitcoin/bitcoin#28287 | core#28287]]

Test Plan:
`ninja all check-all`

Test that this passes now and the same functional test sometimes fails  before this commit (>33% failure rate on my machine)
` for i in {1..100}; do ~/dev/bitcoin-abc/build/test/functional/test_runner.py p2p_net_deadlock; done`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Sintayew4, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D17138
roqqit pushed a commit to doged-io/doged that referenced this pull request Nov 21, 2024
Summary:
It has been observed that if both sides of a P2P connection have a significant amount of data to send, a stall can occur, where both try to drain their own send queue before trying to receive. The same issue seems to apply to the current Bitcoin Core codebase, though I don't know whether it's a frequent issue for us.

The core issue is that whenever our optimistic send fails to fully send a message, we do subsequently not even select() for receiving; if it then turns out that sending is not possible either, no progress is made at all. To address this, the solution used in this PR is to still select() for both sending and receiving when an optimistic send fails, but skip receiving if sending succeeded, and (still) doesn't fully drain the send queue.

This is a significant reduction in how aggressive the "receive pushback" mechanism is, because now it will only mildly push back while sending progress is made; if the other side stops receiving entirely, the pushback disappears. I don't think that's a serious problem though:

 - We still have a pushback mechanism at the application buffer level (when the application receive buffer overflows, receiving is paused until messages in the buffer get processed; waiting on our own net_processing thread, not on the remote party).
 - There are cases where the existing mechanism is too aggressive; e.g. when the send queue is non-empty, but tiny, and can be sent with a single send() call. In that case, I think we'd prefer to still receive within the same processing loop of the network thread.

Co-authored-by: Anthony Towns <aj@erisian.com.au>

core#28287:
> rpc: add test-only sendmsgtopeer rpc
>
> This rpc can be used when we want a node to send a message, but
> cannot use a python P2P object, for example for testing of low-level
> net transport behavior.

> test: add basic tests for sendmsgtopeer to rpc_net.py

> test: add functional test for deadlock situation

This is a backport of [[bitcoin/bitcoin#27981 | core#27981]] and  [[bitcoin/bitcoin#28287 | core#28287]]

Test Plan:
`ninja all check-all`

Test that this passes now and the same functional test sometimes fails  before this commit (>33% failure rate on my machine)
` for i in {1..100}; do ~/dev/bitcoin-abc/build/test/functional/test_runner.py p2p_net_deadlock; done`

Reviewers: #bitcoin_abc, Fabien

Reviewed By: #bitcoin_abc, Fabien

Subscribers: Sintayew4, Fabien

Differential Revision: https://reviews.bitcoinabc.org/D17138
@bitcoin bitcoin locked and limited conversation to collaborators Dec 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants