test: fix intermittent failure in p2p_sendtxrcncl.py #26448

mzumsande · 2022-11-03T17:06:07Z

p2p_sendtxrcncl.py currently fails intermittently in the CI, see e.g. https://cirrus-ci.com/task/5511952184115200?logs=ci#L4024

I believe that this is related to the reuse of the parameter p2p_idx=2 of add_outbound_p2p_connection in this test: When we call peer_disconnect, we don't wait until the node has completed the disconnection. So there is a race between setting up the next connection (next addconnection RPC), and if the old one hasn't been removed and has an identical port like the new one (because we didn't increment p2p_idx), CConnman::OpenNetworkConnection just returns without establishing a connection, and the test fails.

Fix this by using distinct disconnect_p2ps instead of peer_disconnect, which waits for the disconnect to complete. We can then use the same value for p2p_idx everywhere.

maflcko

lgtm, but I am not really a fan of writing racy tests unless racyness is the goal of the test.

test/functional/test_framework/test_node.py

maflcko · 2022-11-03T17:17:42Z

test/functional/p2p_sendtxrcncl.py

        peer = self.nodes[0].add_outbound_p2p_connection(
-            SendTxrcnclReceiver(), wait_for_verack=True, p2p_idx=1, connection_type="outbound-full-relay")
+            SendTxrcnclReceiver(), wait_for_verack=True, p2p_idx=ob_peer_id, connection_type="outbound-full-relay")
        assert peer.sendtxrcncl_msg_received
        assert peer.sendtxrcncl_msg_received.initiator
        assert not peer.sendtxrcncl_msg_received.responder
        assert_equal(peer.sendtxrcncl_msg_received.version, 1)
        peer.peer_disconnect()


nit: If there is no guarantee that this will disconnect the peer (due to races), it might be better to remove it, or replace it with disconnect_p2ps, which would also avoid having to modify the index.

ok, I used disconnect_p2ps everywhere instead of peer_disconnect, and also changed p2p_idx=0 everywhere for consistency - this should also work.

Using disconnect_p2ps instead of peer_disconnect makes the node wait for the disconnect to complete. As a result, we can reuse p2p_idx=0 in the add_outbound_p2p_connection calls.

DrahtBot · 2022-11-04T09:55:57Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#26359 (p2p, refactor: p2p: Erlay support signaling #23443 (Erlay support signaling) follow-ups by naumenkogs)
#26257 (script, test: python linter fixups and updates by jonatack)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

maflcko

thanks, however this fixes one race, but introduces two more

maflcko · 2022-11-04T11:10:49Z

test/functional/p2p_sendtxrcncl.py


        self.log.info('SENDTXRCNCL if block-relay-only triggers a disconnect')
        peer = self.nodes[0].add_outbound_p2p_connection(
-            PeerNoVerack(), wait_for_verack=False, p2p_idx=3, connection_type="block-relay-only")
+            PeerNoVerack(), wait_for_verack=False, p2p_idx=0, connection_type="block-relay-only")
        with self.nodes[0].assert_debug_log(["we indicated no tx relay; disconnecting"]):
            peer.send_message(create_sendtxrcncl_msg(initiator=False))
            peer.wait_for_disconnect()


I think this is racy, so you'd have to call self.nodes[0].disconnect_p2ps() or use a unique index

Can you explain why? We called disconnect_p2ps() before add_outbound_p2p_connection, so are guaranteed to have no peer when doing so. Within thewith block we call wait_for_disconnect (which waits until the disconnect initiated by the node is completed), so the subsequent subtest also shouldn't be able to run before the disconnect of the current one is completed. What am I missing?

Oh, right. As the disconnect happens on the side of the node, there shouldn't be a race where the node is not aware of the disconnect.

maflcko · 2022-11-04T11:11:38Z

test/functional/test_framework/test_node.py

+
+        p2p_idx must be different for simultaneously connected peers. When reusing it for the next peer
+        after disconnecting the previous one, it is necessary to wait for the disconnect to finish to avoid
+        a race condition.


Maybe mention that disconnect_p2ps avoids the race?

I tried to describe it in more general terms because disconnect_p2ps can only be used if there is no other, unrelated connection that we need to keep.

maflcko · 2022-11-04T15:02:25Z

review ACK 74d9753

jonatack · 2022-11-04T20:55:53Z

Post-merge ACK

…l.py 74d9753 test: fix intermittent failure in p2p_sendtxrcncl.py (Martin Zumsande) Pull request description: `p2p_sendtxrcncl.py` currently fails intermittently in the CI, see e.g. https://cirrus-ci.com/task/5511952184115200?logs=ci#L4024 I believe that this is related to the reuse of the parameter `p2p_idx=2` of `add_outbound_p2p_connection` in this test: When we call `peer_disconnect`, we don't wait until the node has completed the disconnection. So there is a race between setting up the next connection (next `addconnection` RPC), and if the old one hasn't been removed and has an identical port like the new one (because we didn't increment `p2p_idx`), `CConnman::OpenNetworkConnection` just [returns](https://github.com/bitcoin/bitcoin/blob/5274f324375fd31cf8507531fbc612765d03092f/src/net.cpp#L1976) without establishing a connection, and the test fails. Fix this by using distinct `disconnect_p2ps` instead of `peer_disconnect`, which waits for the disconnect to complete. We can then use the same value for `p2p_idx` everywhere. ACKs for top commit: MarcoFalke: review ACK 74d9753 Tree-SHA512: f99f2550b6b320c0a2416a475c1cf189c009fce3a5abf1d4462486e1bfe309e2c3fd4228a4009b0ca38cb77465ce85e3d22298719eb07302fa0a72fbab0e0668

mzumsande mentioned this pull request Nov 3, 2022

net: Avoid SetTxRelay for feeler connections #26396

Merged

fanquake added the Tests label Nov 3, 2022

mzumsande mentioned this pull request Nov 3, 2022

Intermitted failure in p2p_sendtxrcncl.py #26364

Closed

maflcko approved these changes Nov 3, 2022

View reviewed changes

mzumsande force-pushed the 202211_fix_sendtxrcncl branch from b1cc46d to 1e533dc Compare November 3, 2022 20:35

test: fix intermittent failure in p2p_sendtxrcncl.py

74d9753

Using disconnect_p2ps instead of peer_disconnect makes the node wait for the disconnect to complete. As a result, we can reuse p2p_idx=0 in the add_outbound_p2p_connection calls.

mzumsande force-pushed the 202211_fix_sendtxrcncl branch from 1e533dc to 74d9753 Compare November 3, 2022 20:42

maflcko approved these changes Nov 4, 2022

View reviewed changes

DrahtBot mentioned this pull request Nov 4, 2022

p2p: Erlay support signaling follow-ups #26359

Merged

maflcko merged commit e42ba13 into bitcoin:master Nov 4, 2022

mzumsande deleted the 202211_fix_sendtxrcncl branch November 4, 2022 17:31

DrahtBot mentioned this pull request Nov 4, 2022

script, test: python linter flake8 E275 fixup, update dependencies #26257

Merged

bitcoin locked and limited conversation to collaborators Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: fix intermittent failure in p2p_sendtxrcncl.py #26448

test: fix intermittent failure in p2p_sendtxrcncl.py #26448

Uh oh!

mzumsande commented Nov 3, 2022 •

edited

Loading

Uh oh!

maflcko left a comment

Uh oh!

Uh oh!

maflcko Nov 3, 2022

Uh oh!

mzumsande Nov 3, 2022

Uh oh!

DrahtBot commented Nov 4, 2022

Uh oh!

maflcko left a comment

Uh oh!

maflcko Nov 4, 2022

Uh oh!

mzumsande Nov 4, 2022

Uh oh!

maflcko Nov 4, 2022

Uh oh!

maflcko Nov 4, 2022

Uh oh!

mzumsande Nov 4, 2022

Uh oh!

maflcko commented Nov 4, 2022

Uh oh!

jonatack commented Nov 4, 2022

Uh oh!

Uh oh!

test: fix intermittent failure in p2p_sendtxrcncl.py #26448

test: fix intermittent failure in p2p_sendtxrcncl.py #26448

Uh oh!

Conversation

mzumsande commented Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maflcko Nov 3, 2022

Choose a reason for hiding this comment

Uh oh!

mzumsande Nov 3, 2022

Choose a reason for hiding this comment

Uh oh!

DrahtBot commented Nov 4, 2022

Conflicts

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

maflcko Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

mzumsande Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

maflcko Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

maflcko Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

mzumsande Nov 4, 2022

Choose a reason for hiding this comment

Uh oh!

maflcko commented Nov 4, 2022

Uh oh!

jonatack commented Nov 4, 2022

Uh oh!

Uh oh!

mzumsande commented Nov 3, 2022 •

edited

Loading