Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Release Summary:
Description of changes:
As an alternative to f217c85, we could instead retry the interop tests when they fail. As long as we have remaining attempts, we will not report the failure via qns-status-report.
I modeled this solution off of https://github.com/orgs/community/discussions/67654#discussioncomment-8038649.
Call-outs:
Currently, I'm retrying the entire "qns" workflow. I could probably limit the retry to just the interop tests by introducing a new "interop-trigger" job that "interop" depends on, and then rerun just that job. HOWEVER, Github counts "attempts" per workflow, NOT per job. So by retrying a subset of jobs, the counter we rely on to check for retries goes up for all jobs anyway. I think that could be confusing? Retrying the entire workflow seemed more straightforward and clear.
Testing:
How was this change tested? With difficulty.
Running the workflow at all
You can't use "gh workflow run" on workflows that don't exist on the default branch yet, meaning workflows that haven't been committed to main. So this change can't actually be tested as-is. Here's an issue about that problem: www.github.com/cli/cli/issues/9781
I worked around this by hijacking the tshark workflow, which already exists on main and even has a single workflow_dispatch input (the workflow_dispatch inputs of your spec also have to match what's on main). I overwrote the tshark workflow with my new retry workflow, and then tested with that. Comparing the two files:
Failure testing
Here's the PR I opened to test: #2762
I made a few other changes to 1) let me test with a PR and 2) ensure the interop tests fail. See the "enable testing" commit: 6cb209a.
Note that qns has 3 attempts: https://github.com/aws/s2n-quic/actions/runs/17124819438/job/48579132466 Those were automatically triggered. "qns-status-report" was also skipped on all but the last attempt.
Success Testing
I then modified my forced failure to succeed on the second attempt. See the "enable testing success" commit: b902d1a
Note that qns then has only 2 attempts, and succeeds on the second attempt: https://github.com/aws/s2n-quic/actions/runs/17133219460/job/48602891524?pr=2762 The first attempt skipped qns-status-report.
Accidental Real Failure testing
While I was doing "Success Testing", the handshakeloss interop test also failed for real :) https://github.com/aws/s2n-quic/actions/runs/17133219460/job/48607253793?pr=2762 That attempt was supposed to succeed, but it instead failed and a third attempt was made.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.