test: Fix multiprocess CI timeout in p2p_tx_download #29926

fjahr · 2024-04-20T23:30:44Z

This addresses multiprocess CI failures in p2p_tx_download.py, likely introduced by #29827.

Example failure: https://cirrus-ci.com/task/5622109341220864

I was having a hard time reproducing or rationalizing the root cause of the issue but it seemed very likely the mock time wasn't working as expected without another reset and I got a successful run with it when I temporarily introduced it to another PR I am working on: https://cirrus-ci.com/task/5109555795853312

DrahtBot · 2024-04-20T23:30:48Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

fjahr · 2024-04-21T00:03:33Z

Hm, still seems to be flakey at least, giving it one more shot with longer bump

fjahr · 2024-04-21T00:34:39Z

#28313 / #28321 may be related

glozow

(cc @theStack, indeed looks like it's from #29827)

glozow · 2024-04-22T10:31:35Z

test/functional/p2p_tx_download.py

-        node.bumpmocktime(MAX_GETDATA_INBOUND_WAIT)
+        node.setmocktime(int(time.time()))
+        node.bumpmocktime(MAX_GETDATA_INBOUND_WAIT + 300)


I haven't been able to reproduce this issue and am mostly guessing here, but if it's a setmocktime problem, perhaps just do a node.setmocktime(0) to reset it a few lines up (shouldn't impact the test except maybe make it take 2sec longer)?

I'm not convinced that the +300 would make a difference since the timing for getdata isn't variable like it is for announcements (which was the case in #28321)

I initially had has only used the node.setmocktime(0) in that place and that succeeded when I temporarily pushed it in an unrelated PR but then here it failed when I opened this one. After adding the +300 it succeeded, but yeah, this may have been just luck again. I will push a version without +300 and and the node.setmocktime(0) moved higher up.

I just saw that the Re-Run button also appears for CI jobs that succeeded, I will try to re-rerun this version a few times manually so we can have bit more confidence before merging.

Had another failure, so I will add the +300 back and see if that is a permanent fix.

This addresses a timeout error in the multiprocess CI job.

maflcko · 2024-04-22T14:15:33Z

Example failure: https://cirrus-ci.com/task/5622109341220864

If I download https://api.cirrus-ci.com/v1/task/5622109341220864/logs/ci.log, I get

2024-04-21T00:10:09.801000Z TestFramework.utils (ERROR): wait_until() failed. Predicate: ''''
        wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
'''
[node 1] Cleaning up leftover process
[node 0] Cleaning up leftover process


�[1mstderr:
�[0mTraceback (most recent call last):
  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/p2p_tx_download.py", line 302, in <module>
    TxDownloadTest().main()
  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 155, in main
    exit_code = self.shutdown()
                ^^^^^^^^^^^^^^^
  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 314, in shutdown
    self.network_thread.close()
  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/p2p.py", line 732, in close
    wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
  File "/ci_container_base/ci/scratch/build/bitcoin-i686-pc-linux-gnu/test/functional/test_framework/util.py", line 293, in wait_until_helper_internal
    raise AssertionError("Predicate {} not true after {} seconds".format(predicate_source, timeout))
AssertionError: Predicate ''''
        wait_until_helper_internal(lambda: not self.network_event_loop.is_running(), timeout=timeout)
''' not true after 10.0 seconds

Which indicates that the problem is during shutdown, after the test, not in the test.

So I don't think adding mocktime will fix it. Maybe you were running into a different issue, or are trying to fix a different bug?

maflcko · 2024-04-22T15:10:31Z

See also the current test failure, which remains: https://github.com/bitcoin/bitcoin/pull/29926/checks?check_run_id=24105987877

fjahr · 2024-04-22T15:20:09Z

Which indicates that the problem is during shutdown, after the test, not in the test.

So I don't think adding mocktime will fix it. Maybe you were running into a different issue, or are trying to fix a different bug?

Hm, I didn't see Stopping nodes being printed so I am not convinced it's during shutdown but maybe I am just not familiar enough with the test framework. I am closing since it seems even with the +30 it's still failing and it seems duplicate #29933 might have a better approach.

maflcko · 2024-04-22T15:23:42Z

Hm, I didn't see Stopping nodes being printed so I am not convinced it's during shutdown

This will be printed after the network thread is closed. However, the network thread not closing is the underlying bug.

You can see this in the traceback I posted above. (self.network_thread.close() is line 314, and "Stopping nodes" is line 316)

DrahtBot added the Tests label Apr 20, 2024

fjahr mentioned this pull request Apr 20, 2024

refactor: Use our own implementation of urlDecode #29904

Merged

fjahr force-pushed the 2024-04-p2p-tx-multiprocess-fail branch from 7882a43 to c5b2d2c Compare April 21, 2024 00:02

DrahtBot added the CI failed label Apr 21, 2024

DrahtBot removed the CI failed label Apr 21, 2024

glozow reviewed Apr 22, 2024

View reviewed changes

fjahr force-pushed the 2024-04-p2p-tx-multiprocess-fail branch from c5b2d2c to edbfead Compare April 22, 2024 13:01

fjahr mentioned this pull request Apr 22, 2024

test: Fix intermittent timeout in p2p_tx_download.py #29933

Merged

test: Add mocktime reset in p2p_tx_download

fb154e3

This addresses a timeout error in the multiprocess CI job.

fjahr force-pushed the 2024-04-p2p-tx-multiprocess-fail branch from edbfead to fb154e3 Compare April 22, 2024 14:04

DrahtBot added the CI failed label Apr 22, 2024

fjahr closed this Apr 22, 2024

bitcoin locked and limited conversation to collaborators Apr 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: Fix multiprocess CI timeout in p2p_tx_download #29926

test: Fix multiprocess CI timeout in p2p_tx_download #29926

Uh oh!

fjahr commented Apr 20, 2024 •

edited

Loading

Uh oh!

DrahtBot commented Apr 20, 2024 •

edited

Loading

Uh oh!

fjahr commented Apr 21, 2024

Uh oh!

fjahr commented Apr 21, 2024

Uh oh!

glozow left a comment

Uh oh!

glozow Apr 22, 2024

Uh oh!

glozow Apr 22, 2024 •

edited

Loading

Uh oh!

fjahr Apr 22, 2024

Uh oh!

fjahr Apr 22, 2024

Uh oh!

maflcko commented Apr 22, 2024

Uh oh!

maflcko commented Apr 22, 2024

Uh oh!

fjahr commented Apr 22, 2024

Uh oh!

maflcko commented Apr 22, 2024

Uh oh!

Uh oh!

test: Fix multiprocess CI timeout in p2p_tx_download #29926

test: Fix multiprocess CI timeout in p2p_tx_download #29926

Uh oh!

Conversation

fjahr commented Apr 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DrahtBot commented Apr 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage

Reviews

Uh oh!

fjahr commented Apr 21, 2024

Uh oh!

fjahr commented Apr 21, 2024

Uh oh!

glozow left a comment

Choose a reason for hiding this comment

Uh oh!

glozow Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

glozow Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fjahr Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

fjahr Apr 22, 2024

Choose a reason for hiding this comment

Uh oh!

maflcko commented Apr 22, 2024

Uh oh!

maflcko commented Apr 22, 2024

Uh oh!

fjahr commented Apr 22, 2024

Uh oh!

maflcko commented Apr 22, 2024

Uh oh!

Uh oh!

fjahr commented Apr 20, 2024 •

edited

Loading

DrahtBot commented Apr 20, 2024 •

edited

Loading

glozow Apr 22, 2024 •

edited

Loading