-
Notifications
You must be signed in to change notification settings - Fork 37.7k
test: Implicitly sync after generate*, unless opted out #23300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Concept ACK |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
fa2b65f
to
fa7962b
Compare
fa7962b
to
facc352
Compare
def no_op(self): | ||
pass | ||
|
||
def generate(self, generator, *args, sync_fun=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest making it a required parameter, otherwise there's a high risk of backports breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll backport any major test changes to 22.x (or earlier) at this point, and hopefully this pull is merged before the next major release, so backports to 23.x aren't an issue.
If reviewers insist that this should be done, I suggest doing it in a follow up as a scripted-diff to make it easy to review and keep the conflicts in this pull low.
Concept ACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK facc352 on Ubuntu 20.04
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK facc352 on MacOS 11.6
Nice. It solves the failure I was getting when running test_runner
. I think there are some other tests you could apply it like rpc_packages.py
.
Can you explain this a bit more? Or maybe create a follow-up yourself? |
FWIW, was testing this PR, and with the following patch: diff --git a/.cirrus.yml b/.cirrus.yml
index f7962e951..93866de06 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -164,7 +164,7 @@ task:
- netsh int ipv4 set dynamicport tcp start=1025 num=64511
- netsh int ipv6 set dynamicport tcp start=1025 num=64511
# Exclude feature_dbcrash for now due to timeout
- - python test\functional\test_runner.py --nocleanup --ci --quiet --combinedlogslen=4000 --jobs=4 --timeout-factor=8 --failfast --extended --exclude feature_dbcrash
+ - python test\functional\test_runner.py --nocleanup --ci --quiet --combinedlogslen=4000 --jobs=4 --failfast --extended --exclude feature_dbcrash
task:
name: 'ARM [unit tests, no functional tests] [bullseye]'
diff --git a/ci/test/00_setup_env.sh b/ci/test/00_setup_env.sh
index 8a9d808f5..0685d1f0c 100755
--- a/ci/test/00_setup_env.sh
+++ b/ci/test/00_setup_env.sh
@@ -41,7 +41,7 @@ export RUN_SECURITY_TESTS=${RUN_SECURITY_TESTS:-false}
# By how much to scale the test_runner timeouts (option --timeout-factor).
# This is needed because some ci machines have slow CPU or disk, so sanitizers
# might be slow or a reindex might be waiting on disk IO.
-export TEST_RUNNER_TIMEOUT_FACTOR=${TEST_RUNNER_TIMEOUT_FACTOR:-40}
+export TEST_RUNNER_TIMEOUT_FACTOR=${TEST_RUNNER_TIMEOUT_FACTOR:-1}
export TEST_RUNNER_ENV=${TEST_RUNNER_ENV:-}
export RUN_FUZZ_TESTS=${RUN_FUZZ_TESTS:-false}
export EXPECTED_TESTS_DURATION_IN_SECONDS=${EXPECTED_TESTS_DURATION_IN_SECONDS:-1000} all tasks, except for the native Windows one, pass. In the Windows task two tests fail due timeouts:
|
Thanks for testing. Do you have logs for the two timeouts? |
https://api.cirrus-ci.com/v1/task/4512194625798144/logs/functional_tests.log |
The pruning one has a reorg and a call to
The rpc_misc one looks like an unrelated issue. |
Could be easy tuned: @@ -84,7 +84,7 @@ class RpcMiscTest(BitcoinTestFramework):
# Restart the node with indices and wait for them to sync
self.restart_node(0, ["-txindex", "-blockfilterindex", "-coinstatsindex"])
- self.wait_until(lambda: all(i["synced"] for i in node.getindexinfo().values()))
+ self.wait_until(lambda: all(i["synced"] for i in node.getindexinfo().values()), timeout=120)
# Returns a list of all running indices by default
values = {"synced": True, "best_block_height": 200} |
The sync threads all exit within a second, so I don't understand why raising the timeout would fix this issue. |
That is not the case in CI environment. See my demo branch, and its CI log:
|
Oh you are right. Any idea why |
Have no idea. Locally, on Windows 10 Pro 21H1, it takes about 30 seconds. |
…on native Windows CI task 23c3dcb ci: Drop --failfast in functional tests on native Windows CI task (Hennadii Stepanov) Pull request description: As it was [discussed](bitcoin/bitcoin#22980 (comment)) in bitcoin/bitcoin#22980: > seeing _all_ of the failed tests can give us a useful hint for debugging (was thinking about that while reviewing and testing bitcoin/bitcoin#23300). There is a [concern](bitcoin/bitcoin#22980 (comment)) about such approach: > If there is a CI failure, it will be good to know the result as early as possible after opening the pull request. But, [OTOH](bitcoin/bitcoin#22980 (comment)): > the average amount of saved time for such an approach [using `--failfast`] is less significant than it could appear. ACKs for top commit: MarcoFalke: cr ACK 23c3dcb seems fine to give this a try Tree-SHA512: d28f5712b4edfdbcef48b0633017da9172cef1835bcea51eaeeabf15c133f6bb6888227afc130279c3899365a4fd0f200fb9b0c4cb1ff80f21cbc766b8907764
The most frequent failure in functional tests are intermittent races. Fixing such bugs is cumbersome because it involves:
self.sync_all()
where it was forgottenAlso, writing a linter to catch those is not possible, nor is review effective in finding these bugs prior to merge.
Fix all future intermittent races caused by a missing sync_block call by calling
sync_all
implicitly after eachgenerate*
, unless opted out. This ensures that the code is race-free (with regards to blocks) when the tests pass once, instead of our current approach where the code can never be guaranteed to be race-free.There are some scripted-diff cleanups (see #22567), but they will be submitted in a follow-up to reduce the conflicts in this pull.