-
Notifications
You must be signed in to change notification settings - Fork 37.7k
test: fix intermittent failure in p2p_sendtxrcncl.py #26448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but I am not really a fan of writing racy tests unless racyness is the goal of the test.
test/functional/p2p_sendtxrcncl.py
Outdated
peer = self.nodes[0].add_outbound_p2p_connection( | ||
SendTxrcnclReceiver(), wait_for_verack=True, p2p_idx=1, connection_type="outbound-full-relay") | ||
SendTxrcnclReceiver(), wait_for_verack=True, p2p_idx=ob_peer_id, connection_type="outbound-full-relay") | ||
assert peer.sendtxrcncl_msg_received | ||
assert peer.sendtxrcncl_msg_received.initiator | ||
assert not peer.sendtxrcncl_msg_received.responder | ||
assert_equal(peer.sendtxrcncl_msg_received.version, 1) | ||
peer.peer_disconnect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If there is no guarantee that this will disconnect the peer (due to races), it might be better to remove it, or replace it with disconnect_p2ps
, which would also avoid having to modify the index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I used disconnect_p2ps
everywhere instead of peer_disconnect
, and also changed p2p_idx=0
everywhere for consistency - this should also work.
b1cc46d
to
1e533dc
Compare
Using disconnect_p2ps instead of peer_disconnect makes the node wait for the disconnect to complete. As a result, we can reuse p2p_idx=0 in the add_outbound_p2p_connection calls.
1e533dc
to
74d9753
Compare
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, however this fixes one race, but introduces two more
|
||
self.log.info('SENDTXRCNCL if block-relay-only triggers a disconnect') | ||
peer = self.nodes[0].add_outbound_p2p_connection( | ||
PeerNoVerack(), wait_for_verack=False, p2p_idx=3, connection_type="block-relay-only") | ||
PeerNoVerack(), wait_for_verack=False, p2p_idx=0, connection_type="block-relay-only") | ||
with self.nodes[0].assert_debug_log(["we indicated no tx relay; disconnecting"]): | ||
peer.send_message(create_sendtxrcncl_msg(initiator=False)) | ||
peer.wait_for_disconnect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is racy, so you'd have to call self.nodes[0].disconnect_p2ps()
or use a unique index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why? We called disconnect_p2ps()
before add_outbound_p2p_connection
, so are guaranteed to have no peer when doing so. Within thewith
block we call wait_for_disconnect
(which waits until the disconnect initiated by the node is completed), so the subsequent subtest also shouldn't be able to run before the disconnect of the current one is completed. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. As the disconnect happens on the side of the node, there shouldn't be a race where the node is not aware of the disconnect.
|
||
p2p_idx must be different for simultaneously connected peers. When reusing it for the next peer | ||
after disconnecting the previous one, it is necessary to wait for the disconnect to finish to avoid | ||
a race condition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention that disconnect_p2ps
avoids the race?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to describe it in more general terms because disconnect_p2ps
can only be used if there is no other, unrelated connection that we need to keep.
review ACK 74d9753 |
Post-merge ACK |
…l.py 74d9753 test: fix intermittent failure in p2p_sendtxrcncl.py (Martin Zumsande) Pull request description: `p2p_sendtxrcncl.py` currently fails intermittently in the CI, see e.g. https://cirrus-ci.com/task/5511952184115200?logs=ci#L4024 I believe that this is related to the reuse of the parameter `p2p_idx=2` of `add_outbound_p2p_connection` in this test: When we call `peer_disconnect`, we don't wait until the node has completed the disconnection. So there is a race between setting up the next connection (next `addconnection` RPC), and if the old one hasn't been removed and has an identical port like the new one (because we didn't increment `p2p_idx`), `CConnman::OpenNetworkConnection` just [returns](https://github.com/bitcoin/bitcoin/blob/5274f324375fd31cf8507531fbc612765d03092f/src/net.cpp#L1976) without establishing a connection, and the test fails. Fix this by using distinct `disconnect_p2ps` instead of `peer_disconnect`, which waits for the disconnect to complete. We can then use the same value for `p2p_idx` everywhere. ACKs for top commit: MarcoFalke: review ACK 74d9753 Tree-SHA512: f99f2550b6b320c0a2416a475c1cf189c009fce3a5abf1d4462486e1bfe309e2c3fd4228a4009b0ca38cb77465ce85e3d22298719eb07302fa0a72fbab0e0668
p2p_sendtxrcncl.py
currently fails intermittently in the CI, see e.g. https://cirrus-ci.com/task/5511952184115200?logs=ci#L4024I believe that this is related to the reuse of the parameter
p2p_idx=2
ofadd_outbound_p2p_connection
in this test: When we callpeer_disconnect
, we don't wait until the node has completed the disconnection. So there is a race between setting up the next connection (nextaddconnection
RPC), and if the old one hasn't been removed and has an identical port like the new one (because we didn't incrementp2p_idx
),CConnman::OpenNetworkConnection
just returns without establishing a connection, and the test fails.Fix this by using distinct
disconnect_p2ps
instead ofpeer_disconnect
, which waits for the disconnect to complete. We can then use the same value forp2p_idx
everywhere.