Skip to content

Conversation

mzumsande
Copy link
Contributor

The test fails intermittently, see https://cirrus-ci.com/task/6403578080788480?logs=ci#L3521 and #24748 (comment).
I think it's because of a race between the python NetworkThread and the actual
test, which will both call initiate_v2_handshake. I could reproduce it by adding a sleep into initiate_v2_handshake after the line self.sent_garbage = random.randbytes(garbage_len).

Fix this by waiting for the first initiate_v2_handshake to have finished before calling it a second time.

@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 30, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK theStack, stratospher, achow101

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29358 (test: use v2 everywhere for P2PConnection if --v2transport is enabled by mzumsande)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Copy link
Contributor

@stratospher stratospher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clever find! will ACK after this is fixed.

@@ -59,6 +59,8 @@ def data_received(self, t):
# check that data can be received on recvbuf only when mismatch from V1_PREFIX happens (send_net_magic = False)
assert self.v2_state.can_data_be_received and not self.v2_state.send_net_magic

def on_open(self):
self.connection_made = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1d04d58: i think we should call it a different name - self.connected or something like that? i'm still getting a crash with the sleep statement because python is confusing connection_made variable and callback function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, renamed to connection_opened

@fanquake fanquake added this to the 27.0 milestone Jan 31, 2024
Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

I think you need to also initialize the newly introduced variable in the PeerEarlyKey constructor, otherwise the following can happen in the wait_until loop:

AttributeError: 'PeerEarlyKey' object has no attribute 'connection_made2'

This fixes a possible race between the python NetworkThread and the actual
test, which will both call initiate_v2_handshake.
@mzumsande mzumsande force-pushed the 202401_fix_flaky_earlykey branch from 1d04d58 to 9642aef Compare January 31, 2024 15:27
@mzumsande
Copy link
Contributor Author

Fixed both issues, thanks!

Copy link
Member

@sr-gi sr-gi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we will not better off having a simpler initiate_v2_handshake in this case and avoiding sync + async logic. Given we end up having to manually send raw messages and accessing peer1 internal state anyway, I think it would be simpler to send everything manually:

class TestEncryptedP2PState(EncryptedP2PState):
    def __init__(self):
        super().__init__(initiating=True, net='regtest')
        self.magic_sent = False

    def initiate_v2_handshake(self):
        self.privkey_ours, self.ellswift_ours = ellswift_create()
        self.sent_garbage = random.randbytes(random.randrange(4096))
        # We'll take care to do the handshake manually
        return b""

class PeerEarlyKey(P2PInterface):
    """Custom implementation of P2PInterface which uses modified v2 P2P protocol functions for testing purposes."""
    def __init__(self):
        super().__init__()
        self.v2_state = None

    def connection_made(self, transport):
        """64 bytes ellswift is sent in 2 parts during `initial_v2_handshake()`"""
        self.v2_state = TestEncryptedP2PState()
        super().connection_made(transport)

    def data_received(self, t):
        # check that data can be received on recvbuf only when mismatch from V1_PREFIX happens
        assert self.v2_state.magic_sent


class P2PEarlyKey(BitcoinTestFramework):
    def set_test_params(self):
        self.num_nodes = 1
        self.extra_args = [["-v2transport=1", "-peertimeout=3"]]

    def run_test(self):
        self.log.info('Sending ellswift bytes in parts to ensure that response from responder is received only when')
        self.log.info('ellswift bytes have a mismatch from the 16 bytes(network magic followed by "version\\x00\\x00\\x00\\x00\\x00")')
        node0 = self.nodes[0]
        self.log.info('Sending first 4 bytes of ellswift which match network magic')
        self.log.info('If a response is received, assertion failure would happen in our custom data_received() function')
        peer1 = node0.add_p2p_connection(PeerEarlyKey(), wait_for_verack=False, send_version=False, supports_v2_p2p=True)
        # Send only the network magic first
        peer1.send_raw_message(b"\xfa\xbf\xb5\xda")
        peer1.v2_state.magic_sent = True
        self.log.info('Sending remaining ellswift and garbage which are different from V1_PREFIX. Since a response is')
        self.log.info('expected now, our custom data_received() function wouldn\'t result in assertion failure')
        # Send the remaining bytes, if any bytes were received before this point `data_received` would have resulted in an assertion fail
        peer1.send_raw_message(peer1.v2_state.ellswift_ours[4:] + peer1.v2_state.sent_garbage)
        peer1.wait_for_disconnect(timeout=5)
        self.log.info('successful disconnection when MITM happens in the key exchange phase')

if __name__ == '__main__':
    P2PEarlyKey().main()

Copy link
Contributor

@theStack theStack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested ACK 9642aef

Reproduced the issue and verified its fix by adding an artificial delay in initiate_v2_handshake, as suggested in the PR description.
(With artificial delays of 3 seconds or more, the test fails again with an IOError('Not connected'), but this is expected as the TestNode closes the connection if there is no activity: ... [InactivityCheck] [net] socket no message in first 3 seconds ...)

@DrahtBot DrahtBot requested a review from stratospher January 31, 2024 15:56
@mzumsande
Copy link
Contributor Author

I wonder if we will not better off having a simpler initiate_v2_handshake in this case and avoiding sync + async logic. Given we end up having to manually send raw messages and accessing peer1 internal state anyway, I think it would be simpler to send everything manually:

Sounds reasonable to me, but I'd like to refer that to @stratospher, who I believe has plans to extend / rewrite the test anyway (see the WIP branch linked here), and just fix the CI here straightforwardly.

@stratospher
Copy link
Contributor

tested ACK 9642aef.

Sounds reasonable to me, but I'd like to refer that to @stratospher, who I believe has plans to extend / rewrite the test anyway (see the WIP branch linked #24748 (comment)), and just fix the CI here straightforwardly.

yes, will look into it! extending this test for more v2 disconnection scenarios in this WIP branch

@DrahtBot DrahtBot removed the request for review from stratospher January 31, 2024 17:25
@achow101
Copy link
Member

ACK 9642aef

@achow101 achow101 merged commit 4b66877 into bitcoin:master Jan 31, 2024
@mzumsande mzumsande deleted the 202401_fix_flaky_earlykey branch February 1, 2024 18:15
kwvg added a commit to kwvg/dash that referenced this pull request Oct 15, 2024
kwvg added a commit to kwvg/dash that referenced this pull request Oct 23, 2024
kwvg added a commit to kwvg/dash that referenced this pull request Oct 23, 2024
kwvg added a commit to kwvg/dash that referenced this pull request Oct 23, 2024
kwvg added a commit to kwvg/dash that referenced this pull request Oct 24, 2024
kwvg added a commit to kwvg/dash that referenced this pull request Oct 24, 2024
PastaPastaPasta added a commit to dashpay/dash that referenced this pull request Oct 24, 2024
, bitcoin#29372, bitcoin#29460, bitcoin#29358, bitcoin#29511, bitcoin#29390, bitcoin#29431, bitcoin-core/gui#788 (BIP324 backports: part 4)

4735b82 merge bitcoin#29431: disconnection scenarios during v2 handshake (Kittywhiskers Van Gogh)
cc6b88e merge bitcoin-core/gui#788: update session ID tooltip (Kittywhiskers Van Gogh)
2455862 merge bitcoin#29390: speedup bip324_cipher.py unit test (Kittywhiskers Van Gogh)
062aaf1 merge bitcoin#29511: Fix intermittent failure in rpc_net.py --v2transport (Kittywhiskers Van Gogh)
54972e8 merge bitcoin#29358: use v2 everywhere for P2PConnection if --v2transport is enabled (Kittywhiskers Van Gogh)
4cce72f test: add missing debug log assertion in `p2p_invalid_messages.py` (Kittywhiskers Van Gogh)
bd2fe61 merge bitcoin#29460: assert rpc error for addnode v2transport not enabled (Kittywhiskers Van Gogh)
5ee15fa merge bitcoin#29372: fix intermittent failure in `rpc_setban.py --v2transport` (Kittywhiskers Van Gogh)
e278818 merge bitcoin#29352: fix intermittent failure in p2p_v2_earlykeyresponse (Kittywhiskers Van Gogh)
6b2a8b5 merge bitcoin#24748: functional tests for v2 P2P encryption (Kittywhiskers Van Gogh)
32500f2 merge bitcoin#27653: add unit test coverage for Python ECDSA implementation (Kittywhiskers Van Gogh)
9f476c6 net: add Dash network message short IDs, allocate range 128 onwards (Kittywhiskers Van Gogh)

Pull request description:

  ## Additional Information

  * Depends on #6329

  * Dash-specific P2P messages have been allocated short IDs after 128 based on a prior suggestion ([source](#6280 (comment))) as there are 255 potential short IDs (ID `0` is reserved for V1 fallback, [source](https://github.com/dashpay/dash/blob/a7bbcc823d800b293e9ec54dff518fa9929c763c/src/net.cpp#L1019)) and upstream uses 28 short IDs (though Dash has left ID `5` blank as we do not implement the `FEEFILTER` message, [source](https://github.com/dashpay/dash/blob/a7bbcc823d800b293e9ec54dff518fa9929c763c/src/net.cpp#L1024)).

    As it is unlikely that upstream will utilize more than 127 short IDs (and the spec refers to IDs after 32 as "undefined", [source](https://github.com/bitcoin/bips/blob/52894e1aa7af27f4ae7836a391643124c71e2639/bip-0324.mediawiki#v2-bitcoin-p2p-message-structure)), there shouldn't be an adverse effect to utilizing IDs >=128. The unified array of short IDs are accessible through `V2ShortIDs()`.

    * As there are checks to see if an ID *can* be valid by checking against the array size (which wouldn't work here as we create an array of 256 entries combining both upstream and Dash's allocated short IDs, filling the rest with blank values and we cannot ignore blank values to know if a value is unallocated as the blanking could also signal a reservation, [source](https://github.com/dashpay/dash/blob/a7bbcc823d800b293e9ec54dff518fa9929c763c/src/net.cpp#L1048-L1052)), such a check needs to be done by using `IsValidV2ShortID()`.
    * `V2ShortIDs()` isn't as elegant as desired as `std::fill` and `std::copy` are not `constexpr` until C++20 ([source](https://en.cppreference.com/w/cpp/algorithm/fill), [source](https://en.cppreference.com/w/cpp/algorithm/copy)) and until we drop C++17 support, we have to be mindful of that.

  * `masternode connect` will now _attempt_ to establish a P2Pv2 connection if the node *initiating* the connection has opted-in using the new argument (`v2transport`) and the node was started with P2Pv2 enabled (using the launch argument, `-v2transport`).

    This mirrors changes to behavior to `addconnection` introduced in [bitcoin#24748](bitcoin#24748)

  * The oversized payload test in `p2p_invalid_messages.py` will expect an excessively large message of size of `3145729` bytes (and in P2Pv2, `3145742` bytes), as opposed to upstream's `4000001` and `4000014` bytes respectively as Dash has a lower protocol limit of 3MiB ([source](https://github.com/dashpay/dash/blob/a7bbcc823d800b293e9ec54dff518fa9929c763c/src/net.h#L80-L81)) vs Bitcoin's 4MB ([source](https://github.com/bitcoin/bitcoin/blob/225718eda89d65a7041c33e1994d3bf7bd71bbdd/src/net.h#L62-L63))

  ## Breaking Changes

  None expected.

  ## Checklist:

  - [x] I have performed a self-review of my own code
  - [x] I have commented my code, particularly in hard-to-understand areas
  - [x] I have added or updated relevant unit/integration/functional/e2e tests
  - [x] I have made corresponding changes to the documentation **(note: N/A)**
  - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_

ACKs for top commit:
  PastaPastaPasta:
    utACK 4735b82
  UdjinM6:
    light ACK 4735b82

Tree-SHA512: 4a9d586133633c109e6a8f20a6c6c5e4b24185fb616bcd8568e546b6e9f886e7a8707e811fd070bbe32c40df269f89a56343a24b67242c6147f9df27275af599
@bitcoin bitcoin locked and limited conversation to collaborators Jan 31, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants