-
Notifications
You must be signed in to change notification settings - Fork 37.7k
test: Let test_runner.py start multiple jobs per timeslot #23799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
At -j32 it speeds things up by a few seconds wall clock time for me. Though the runtime is mostly dominated by a few very long-running jobs. Splitting those up could significantly increase the gain. |
"Duplicate" of #13384? |
Eh, not really - this still polls every 0.5s. It's just not limited to detecting at most one completed task per 0.5s. |
The approach in #13384 would be the most efficient, to start a new test as soon as another stops. But this is an improvement. Concept ACK. |
I'm perfectly fine with the approach in #13384 as well. This PR is just an obvious incremental improvement to the current code with no impact apart from making things a bit faster, so I'd expect it to be perhaps less controversial. |
i = 0 | ||
while i < test_count: | ||
for test_result, testdir, stdout, stderr in job_queue.get_next(): | ||
test_results.append(test_result) | ||
i += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked at this code in which for i in range(test_count)
is replaced by while i< test_count)
:
Read the answers here https://stackoverflow.com/questions/869229/why-is-looping-over-range-in-python-faster-than-using-a-while-loop and I am not sure why this would be faster.
Ignore the review if it does not make sense or I am missing something important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not what makes it faster.
The speedup is due to get_next()
now returning all finished jobs, instead of one finished job (and then sleeping 0.5s between calls). The change here is just to deal with the fact that the returned value is now a list of jobs instead of a single one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 975097f
Sure. Though I don't think it was controversial. I definitely didn't mean my comment like that at the time, it was far from a NACK. I just had a question about the dot-printing, and not being able to exit with Ctrl-C was lightly annoying. I'm sure those could be solved. |
Yeah, I am wondering if the CTRL+C needs to be captured and translated into a kill? |
Code review and lightly tested ACK 975097f
Maybe. Usually it's a result of accidentally ignoring |
… timeslot 975097f Let test_runner.py start multiple jobs per timeslot (Pieter Wuille) Pull request description: test_runner.py currently only checks every 0.5s whether any job has finished, and if so, starts at most one new job. At higher parallellism it becomes increasingly likely that multiple jobs have finished at the same time. Fix this by always noticing *all* finished jobs every timeslot, and starting as many new ones. ACKs for top commit: laanwj: Code review and lightly tested ACK 975097f prayank23: ACK bitcoin@975097f Tree-SHA512: b70c51f05efcde9bc25475c192b86e86b4c399495b42dee20576af3e6b99e8298be8b9e82146abdabbaedb24a13ee158a7c8947518b16fc4f33a3b434935b550
break | ||
i = 0 | ||
while i < test_count: | ||
for test_result, testdir, stdout, stderr in job_queue.get_next(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...............
Remaining jobs: [wallet_import_rescan.py --legacy-wallet, p2p_node_network_limited.py]
..............................................................................................................................................................................................................................
Remaining jobs: [wallet_import_rescan.py --legacy-wallet]
......................................................................................................................................................................................
----------------------------------------------------------------------
Ran 10 tests in 0.753s
OK
Traceback (most recent call last):
File "test/functional/test_runner.py", line 816, in <module>
main()
File "test/functional/test_runner.py", line 460, in main
run_tests(
File "test/functional/test_runner.py", line 535, in run_tests
for test_result, testdir, stdout, stderr in job_queue.get_next():
File "test/functional/test_runner.py", line 647, in get_next
raise IndexError('pop from empty list')
IndexError: pop from empty list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't understand how this is possible. It requires get_next()
to be called when self.jobs
and self.test_list
are empty, while i < test_count
in run_tests()
. The latter should imply that self.test_list
is not empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #23995
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #23995 (comment) for a possible explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate of #
H |
a036358 test: Repair failfast option for test runner (Martin Zumsande) Pull request description: Fixes #23990 After #23799, the `--failfast` option in the test runner for the functional tests stopped working, because a second outer loop was introduced, which would have needed a `break` too for the test runner to fail immediately. This also led to the errors reported in #23990. This provides a straightforward fix for that. There is also #23995 which is a larger refactor, but that hasn't been updated in a while to fix the failfast issue. ACKs for top commit: pg156: Tested ACK a036358. I agree adding the `all_passed` flag to break out of the outer loop when needed makes sense. The "failfast" option works after this change. Tree-SHA512: 3e2f775e36c13d180d32a05cd1cfe0883274e8615cdbbd4e069a9899e9b9ea1091066cf085e93f1c5326bd8ecc6ff524e0dad7c638f60dfdb169fefcdb26ee52
…nner a036358 test: Repair failfast option for test runner (Martin Zumsande) Pull request description: Fixes bitcoin#23990 After bitcoin#23799, the `--failfast` option in the test runner for the functional tests stopped working, because a second outer loop was introduced, which would have needed a `break` too for the test runner to fail immediately. This also led to the errors reported in bitcoin#23990. This provides a straightforward fix for that. There is also bitcoin#23995 which is a larger refactor, but that hasn't been updated in a while to fix the failfast issue. ACKs for top commit: pg156: Tested ACK a036358. I agree adding the `all_passed` flag to break out of the outer loop when needed makes sense. The "failfast" option works after this change. Tree-SHA512: 3e2f775e36c13d180d32a05cd1cfe0883274e8615cdbbd4e069a9899e9b9ea1091066cf085e93f1c5326bd8ecc6ff524e0dad7c638f60dfdb169fefcdb26ee52
test_runner.py currently only checks every 0.5s whether any job has finished, and if so, starts at most one new job. At higher parallellism it becomes increasingly likely that multiple jobs have finished at the same time. Fix this by always noticing all finished jobs every timeslot, and starting as many new ones.