Skip to content

Conversation

amitiuttarwar
Copy link
Contributor

@amitiuttarwar amitiuttarwar commented Sep 4, 2020

small improvements to clarify logic in the functional tests

  1. have test logic in rpc_net.py match run order of the test
  2. remove connect_nodes calls that are redundant with the automatic test setup executed by the test framework

Noticed when I was trying to debug a test for #19725. Small changes but imo very helpful, because they initially confused me.

@DrahtBot DrahtBot added the Tests label Sep 4, 2020
@robot-dreams
Copy link
Contributor

Concept ACK, thanks for clarifying! After trying to trace the code, I agree with you that addnode should add connections in both directions.

  • Would it also make sense to change the "Connect nodes both ways" on line 147 below?
  • The wait_for_helper calls in the connect_nodes implementation wait until the node has received a version and a verack. However, it doesn't wait until the peer has received a verack. Could this introduce some sort of extremely rare race condition?

Copy link

@epson121 epson121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

As @robot-dreams mentioned, these changes could be applied in L147.

Also, similar behaviour can be seen in tests\functional\p2p_disconnect_ban.py L21 and L80, e.g.

    def run_test(self):
        self.log.info("Connect nodes both way")
        connect_nodes(self.nodes[0], 1)
        connect_nodes(self.nodes[1], 0)
        ...

so updating code there as well could be an option.

As a reference, original change was introduced in this PR: https://github.com/bitcoin/bitcoin/pull/16898/files

@amitiuttarwar
Copy link
Contributor Author

thanks for the reviews @ipriver, @robot-dreams & @epson121 !
updated to improve another "connect nodes both ways" instance in p2p_disconnect_ban.py
PR is ready for review!

thoughts around review suggestions / questions :

Would it also make sense to change the "Connect nodes both ways" on line 147 below?

good suggestion but I've looked into it & have found that those connections are important to leave in. Here is why: at the start of the test, the test framework creates the initial node connection behind the scenes, so we only need one additional connect_nodes call to get a two way connection. However in test_getnetworkinfo it sets the networkactive flag to false, which disconnects everything. So, we need to explicitly connect_nodes both ways in order to restore the two way connection between nodes 0 & 1.

Let me know if this doesn't make sense / if there are further questions. You can test it out by commenting out either/both of the lines & seeing the following assertions fail.

@robot-dreams

The wait_for_helper calls in the connect_nodes implementation wait until the node has received a version and a verack. However, it doesn't wait until the peer has received a verack. Could this introduce some sort of extremely rare race condition?

hm, good observation. let's think this through...

  • as one data point, this function is used a lot (every time we set up a test with more than one node) so I'd imagine we'd see some sporadic failures in misc tests if this was the case. that said, we do have some spurious test failures on CI, so insufficient to rule out entirely.
  • I'm trying to think of what kind of problems this could cause. message processing in bitcoin core is single threaded, so if the peer isn't processing the VERACK we've sent, could it mean a deeper problem where all messages aren't being processed? or taking forever or something..
  • which leads me to wonder.. what could cause an issue?.. latency should be low, esp since the nodes are local. what are other elements that could cause long time lags?

I think overall, if a test wanted to do something like "send another message, check that message was received", it would want to use a wait_until for that assertion to be reliable, which should allow enough flexibility in any delay processing the VERACK. What do you think?

@epson121

Also, similar behaviour can be seen in tests\functional\p2p_disconnect_ban.py L21 and L80, e.g.

nice catch! I've updated the setup on L21. but with L80, we need to leave it in for a similar reason to what I explained earlier for the rpc_net.py test.

@amitiuttarwar amitiuttarwar changed the title [test] Clarify rpc_net functional test [test] clarify rpc_net & p2p_disconnect_ban functional tests Sep 11, 2020
@michaelfolkson
Copy link

Concept ACK.

Are there any edge cases where the P2P behavior of a one-sided connection is different to a two-sided connection? Do we want to test both or is a two-sided connection sufficient?

@amitiuttarwar
Copy link
Contributor Author

@michaelfolkson not sure what you mean... I think you're asking about a subset of the tests?

@robot-dreams
Copy link
Contributor

ACK 00ea3fc

Thanks for clarifying! I previously had a brain error—for some reason I thought a single connect_nodes call adds connections in both directions. But that's not the case.

I tested locally to make sure that the connect_nodes calls you removed were indeed no-ops:

Screen Shot 2020-09-16 at 2 36 59 PM

In addition, the connect_nodes call you kept (on line 147) was actually doing something:

Screen Shot 2020-09-16 at 2 44 14 PM

As for my verack question, thanks for thinking this through! Your suggestions seem very reasonable, and in retrospect I'd be very surprised if the timing actually causes test flakiness. Although I don't fully understand what's going on here, I do think further discussion is out of scope for this PR.

@DrahtBot
Copy link
Contributor

DrahtBot commented Sep 19, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@amitiuttarwar
Copy link
Contributor Author

rebased

Since the test framework automatically sets up a connection between the nodes,
the second connect_nodes call was a no-op. Remove the redundant call & add
comments to explain the expected topology.
@amitiuttarwar
Copy link
Contributor Author

rebased

@laanwj
Copy link
Member

laanwj commented Oct 28, 2020

ACK 47ff509

@laanwj laanwj merged commit db26eeb into bitcoin:master Oct 28, 2020
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 28, 2020
…tional tests

47ff509 [test] Clarify setup of node topology. (Amiti Uttarwar)
0672522 [move-only, test]: Match test order with run order (Amiti Uttarwar)

Pull request description:

  small improvements to clarify logic in the functional tests
  1. have test logic in `rpc_net.py` match run order of the test
  2. remove `connect_nodes` calls that are redundant with the automatic test setup executed by the test framework

  Noticed when I was trying to debug a test for bitcoin#19725. Small changes but imo very helpful, because they initially confused me.

ACKs for top commit:
  laanwj:
    ACK 47ff509

Tree-SHA512: 2843da2c0b4f06b2600b3adb97900a62be7bb2228770abd67d86f2a65c58079af22c7c20957474a98c17da85f40a958a6f05cb8198aa0c56a58adc1c31100492
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants