test: Let test_runner.py start multiple jobs per timeslot #23799

sipa · 2021-12-16T20:25:19Z

test_runner.py currently only checks every 0.5s whether any job has finished, and if so, starts at most one new job. At higher parallellism it becomes increasingly likely that multiple jobs have finished at the same time. Fix this by always noticing all finished jobs every timeslot, and starting as many new ones.

sipa · 2021-12-16T20:27:27Z

At -j32 it speeds things up by a few seconds wall clock time for me. Though the runtime is mostly dominated by a few very long-running jobs. Splitting those up could significantly increase the gain.

maflcko · 2021-12-16T20:29:09Z

"Duplicate" of #13384?

sipa · 2021-12-16T20:31:50Z

Eh, not really - this still polls every 0.5s. It's just not limited to detecting at most one completed task per 0.5s.

laanwj · 2021-12-17T15:44:39Z

The approach in #13384 would be the most efficient, to start a new test as soon as another stops. But this is an improvement. Concept ACK.

sipa · 2021-12-17T16:08:55Z

I'm perfectly fine with the approach in #13384 as well. This PR is just an obvious incremental improvement to the current code with no impact apart from making things a bit faster, so I'd expect it to be perhaps less controversial.

ghost · 2021-12-17T16:32:53Z

test/functional/test_runner.py

+    i = 0
+    while i < test_count:
+        for test_result, testdir, stdout, stderr in job_queue.get_next():
+            test_results.append(test_result)
+            i += 1


Looked at this code in which for i in range(test_count) is replaced by while i< test_count):

Read the answers here https://stackoverflow.com/questions/869229/why-is-looping-over-range-in-python-faster-than-using-a-while-loop and I am not sure why this would be faster.

Ignore the review if it does not make sense or I am missing something important.

That's not what makes it faster.

The speedup is due to get_next() now returning all finished jobs, instead of one finished job (and then sleeping 0.5s between calls). The change here is just to deal with the fact that the returned value is now a list of jobs instead of a single one.

ghost

ACK 975097f

laanwj · 2021-12-17T18:39:53Z

perhaps less controversial.

Sure. Though I don't think it was controversial. I definitely didn't mean my comment like that at the time, it was far from a NACK. I just had a question about the dot-printing, and not being able to exit with Ctrl-C was lightly annoying. I'm sure those could be solved.

maflcko · 2021-12-18T13:16:07Z

Yeah, I am wondering if the CTRL+C needs to be captured and translated into a kill?

laanwj · 2022-01-05T16:18:48Z

Code review and lightly tested ACK 975097f

Yeah, I am wondering if the CTRL+C needs to be captured and translated into a kill?

Maybe. Usually it's a result of accidentally ignoring KeyboardInterrupt exceptions (for example in threads). But haven't checked it.
I'm going ahead and merge this for now. I guess someone interested in the other PR can pick it up.

… timeslot 975097f Let test_runner.py start multiple jobs per timeslot (Pieter Wuille) Pull request description: test_runner.py currently only checks every 0.5s whether any job has finished, and if so, starts at most one new job. At higher parallellism it becomes increasingly likely that multiple jobs have finished at the same time. Fix this by always noticing *all* finished jobs every timeslot, and starting as many new ones. ACKs for top commit: laanwj: Code review and lightly tested ACK 975097f prayank23: ACK bitcoin@975097f Tree-SHA512: b70c51f05efcde9bc25475c192b86e86b4c399495b42dee20576af3e6b99e8298be8b9e82146abdabbaedb24a13ee158a7c8947518b16fc4f33a3b434935b550

maflcko · 2022-01-06T14:05:20Z

test/functional/test_runner.py

-                break
+    i = 0
+    while i < test_count:
+        for test_result, testdir, stdout, stderr in job_queue.get_next():


............... Remaining jobs: [wallet_import_rescan.py --legacy-wallet, p2p_node_network_limited.py] .............................................................................................................................................................................................................................. Remaining jobs: [wallet_import_rescan.py --legacy-wallet] ...................................................................................................................................................................................... ---------------------------------------------------------------------- Ran 10 tests in 0.753s OK Traceback (most recent call last): File "test/functional/test_runner.py", line 816, in <module> main() File "test/functional/test_runner.py", line 460, in main run_tests( File "test/functional/test_runner.py", line 535, in run_tests for test_result, testdir, stdout, stderr in job_queue.get_next(): File "test/functional/test_runner.py", line 647, in get_next raise IndexError('pop from empty list') IndexError: pop from empty list

https://cirrus-ci.com/task/6633293054476288?logs=ci#L6037

Hmm, I don't understand how this is possible. It requires get_next() to be called when self.jobs and self.test_list are empty, while i < test_count in run_tests(). The latter should imply that self.test_list is not empty.

see #23995 (comment) for a possible explanation.

0967622354toon

Duplicate of #

0967622354toon · 2022-01-06T17:29:20Z

H

a036358 test: Repair failfast option for test runner (Martin Zumsande) Pull request description: Fixes #23990 After #23799, the `--failfast` option in the test runner for the functional tests stopped working, because a second outer loop was introduced, which would have needed a `break` too for the test runner to fail immediately. This also led to the errors reported in #23990. This provides a straightforward fix for that. There is also #23995 which is a larger refactor, but that hasn't been updated in a while to fix the failfast issue. ACKs for top commit: pg156: Tested ACK a036358. I agree adding the `all_passed` flag to break out of the outer loop when needed makes sense. The "failfast" option works after this change. Tree-SHA512: 3e2f775e36c13d180d32a05cd1cfe0883274e8615cdbbd4e069a9899e9b9ea1091066cf085e93f1c5326bd8ecc6ff524e0dad7c638f60dfdb169fefcdb26ee52

…nner a036358 test: Repair failfast option for test runner (Martin Zumsande) Pull request description: Fixes bitcoin#23990 After bitcoin#23799, the `--failfast` option in the test runner for the functional tests stopped working, because a second outer loop was introduced, which would have needed a `break` too for the test runner to fail immediately. This also led to the errors reported in bitcoin#23990. This provides a straightforward fix for that. There is also bitcoin#23995 which is a larger refactor, but that hasn't been updated in a while to fix the failfast issue. ACKs for top commit: pg156: Tested ACK a036358. I agree adding the `all_passed` flag to break out of the outer loop when needed makes sense. The "failfast" option works after this change. Tree-SHA512: 3e2f775e36c13d180d32a05cd1cfe0883274e8615cdbbd4e069a9899e9b9ea1091066cf085e93f1c5326bd8ecc6ff524e0dad7c638f60dfdb169fefcdb26ee52

Let test_runner.py start multiple jobs per timeslot

975097f

DrahtBot added the Tests label Dec 16, 2021

ghost reviewed Dec 17, 2021

View reviewed changes

ghost approved these changes Dec 17, 2021

View reviewed changes

laanwj changed the title ~~Let test_runner.py start multiple jobs per timeslot~~ test: Let test_runner.py start multiple jobs per timeslot Jan 5, 2022

laanwj merged commit 121d47a into bitcoin:master Jan 5, 2022

maflcko reviewed Jan 6, 2022

View reviewed changes

0967622354toon reviewed Jan 6, 2022

View reviewed changes

sipa mentioned this pull request Jan 6, 2022

Simplify test_runner.py a bit #23995

Closed

softminus mentioned this pull request Jan 26, 2022

Add "test groups" to rpc-tests.py zcash/zcash#5497

Open

mzumsande mentioned this pull request Jan 28, 2022

test: Fix failfast option for functional test runner #24195

Merged

bitcoin locked and limited conversation to collaborators Jan 7, 2023

test: Let test_runner.py start multiple jobs per timeslot #23799

test: Let test_runner.py start multiple jobs per timeslot #23799

Uh oh!

Conversation

sipa commented Dec 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sipa commented Dec 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented Dec 16, 2021

Uh oh!

sipa commented Dec 16, 2021

Uh oh!

laanwj commented Dec 17, 2021

Uh oh!

sipa commented Dec 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

sipa Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

laanwj commented Dec 17, 2021

Uh oh!

maflcko commented Dec 18, 2021

Uh oh!

laanwj commented Jan 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko Jan 6, 2022

Choose a reason for hiding this comment

Uh oh!

sipa Jan 6, 2022

Choose a reason for hiding this comment

Uh oh!

sipa Jan 6, 2022

Choose a reason for hiding this comment

Uh oh!

mzumsande Jan 7, 2022

Choose a reason for hiding this comment

Uh oh!

0967622354toon left a comment

Choose a reason for hiding this comment

Uh oh!

0967622354toon commented Jan 6, 2022

Uh oh!

Uh oh!

sipa commented Dec 16, 2021 •

edited

Loading

sipa commented Dec 16, 2021 •

edited

Loading

sipa commented Dec 17, 2021 •

edited

Loading

laanwj commented Jan 5, 2022 •

edited

Loading