-
Notifications
You must be signed in to change notification settings - Fork 37.8k
test: Fix multiprocess CI timeout in p2p_tx_download #29926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process. |
7882a43
to
c5b2d2c
Compare
Hm, still seems to be flakey at least, giving it one more shot with longer bump |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node.bumpmocktime(MAX_GETDATA_INBOUND_WAIT) | ||
node.setmocktime(int(time.time())) | ||
node.bumpmocktime(MAX_GETDATA_INBOUND_WAIT + 300) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't been able to reproduce this issue and am mostly guessing here, but if it's a setmocktime
problem, perhaps just do a node.setmocktime(0)
to reset it a few lines up (shouldn't impact the test except maybe make it take 2sec longer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that the +300 would make a difference since the timing for getdata isn't variable like it is for announcements (which was the case in #28321)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially had has only used the node.setmocktime(0)
in that place and that succeeded when I temporarily pushed it in an unrelated PR but then here it failed when I opened this one. After adding the +300 it succeeded, but yeah, this may have been just luck again. I will push a version without +300 and and the node.setmocktime(0)
moved higher up.
I just saw that the Re-Run button also appears for CI jobs that succeeded, I will try to re-rerun this version a few times manually so we can have bit more confidence before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had another failure, so I will add the +300 back and see if that is a permanent fix.
c5b2d2c
to
edbfead
Compare
This addresses a timeout error in the multiprocess CI job.
edbfead
to
fb154e3
Compare
If I download https://api.cirrus-ci.com/v1/task/5622109341220864/logs/ci.log, I get
Which indicates that the problem is during shutdown, after the test, not in the test. So I don't think adding mocktime will fix it. Maybe you were running into a different issue, or are trying to fix a different bug? |
See also the current test failure, which remains: https://github.com/bitcoin/bitcoin/pull/29926/checks?check_run_id=24105987877 |
Hm, I didn't see |
This will be printed after the network thread is closed. However, the network thread not closing is the underlying bug. You can see this in the traceback I posted above. ( |
This addresses multiprocess CI failures in
p2p_tx_download.py
, likely introduced by #29827.Example failure: https://cirrus-ci.com/task/5622109341220864
I was having a hard time reproducing or rationalizing the root cause of the issue but it seemed very likely the mock time wasn't working as expected without another reset and I got a successful run with it when I temporarily introduced it to another PR I am working on: https://cirrus-ci.com/task/5109555795853312