Skip to content

Conversation

stevenhorsman
Copy link
Member

With the code I originally did I think there is potentially a case where we can get a failure due to timing of steps. Before this change the build-asset-shim-v2
job could start the get-artifacts step and concurrently remove-rootfs-binary-artifacts could run and delete the artifact during the download and result in the error. In this commit, I try to resolve this by making sure that the shim build waits for the artifact deletes to complete before starting.

@stevenhorsman
Copy link
Member Author

This code was tested in https://github.com/kata-containers/kata-containers/actions/runs/12164116090 and didn't hit the problem, but it might be a low chance of hitting it as my previous branch test didn't either

@stevenhorsman stevenhorsman force-pushed the topic/fix-remove-artifact-ordering branch from 1d1cceb to 599df4f Compare December 9, 2024 09:36
@stevenhorsman
Copy link
Member Author

stevenhorsman commented Dec 9, 2024

I've re-tested this on the topic branch (https://github.com/kata-containers/kata-containers/actions/runs/12232815654) after rebase and all the test passed except the agent-api and SNP tests that have known issues, so I believe this is good.

@stevenhorsman stevenhorsman force-pushed the topic/fix-remove-artifact-ordering branch from 599df4f to d372680 Compare December 12, 2024 14:11
@katacontainersbot katacontainersbot added the size/small Small and simple task label Dec 12, 2024
@stevenhorsman
Copy link
Member Author

stevenhorsman commented Dec 12, 2024

I've updated this PR to make the remove artefacts only done in the release stage, so re-testing this in https://github.com/kata-containers/kata-containers/actions/runs/12298151219. do-not-merge added until we can check these results.

@stevenhorsman stevenhorsman added the do-not-merge PR has problems or depends on another label Dec 12, 2024
With the code I originally did I think there is potentially
a case where we can get a failure due to timing of steps.
Before this change the `build-asset-shim-v2`
job could start the `get-artifacts` step and concurrently
`remove-rootfs-binary-artifacts` could run and delete the artifact
during the download and result in the error. In this commit, I
try to resolve this by making sure that the shim build waits
for the artifact deletes to complete before starting.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
@stevenhorsman stevenhorsman force-pushed the topic/fix-remove-artifact-ordering branch 2 times, most recently from c375fb0 to 32e62db Compare December 12, 2024 16:56
@stevenhorsman
Copy link
Member Author

There was an issue in that the shim's needs on the skipped step meant that it was also skipped, so I've added always to hopefully prevent that. Re-running the tests now: https://github.com/kata-containers/kata-containers/actions/runs/12299067589

remove-rootfs-binary-artifacts:
if: ${{ inputs.stage == 'release' }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: how about we rather put this condition in the step (line 215), so that we don't need the always() condition in build-asset-shim-v2?

Copy link
Member Author

@stevenhorsman stevenhorsman Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a better idea. It's a bit ugly with a no-op job, might is nicer that the always. Thanks!

Due to the agent-api tests requiring the agent to be deployed in the
CI by the tarball, so in the short-term lets only do this on the release
stage, so that both kata-manager works with the release and the
agent-api tests work with the other CI builds.

In the longer term we need to re-evaluate what is in our tarballs
(issue #10619), but want to unblock the tests in the short-term.

Fixes: #10630
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
@stevenhorsman stevenhorsman force-pushed the topic/fix-remove-artifact-ordering branch from 32e62db to cf8b827 Compare December 12, 2024 17:38
Copy link
Contributor

@sprt sprt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick fix @stevenhorsman!

@stevenhorsman stevenhorsman removed the do-not-merge PR has problems or depends on another label Dec 13, 2024
@stevenhorsman
Copy link
Member Author

With this update the agent-apis test passing on the manual CI run: https://github.com/kata-containers/kata-containers/actions/runs/12301800878

Copy link
Member

@BbolroC BbolroC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for taking care of this issue @stevenhorsman !

@zvonkok zvonkok merged commit f2d7287 into main Dec 18, 2024
144 of 167 checks passed
@stevenhorsman stevenhorsman deleted the topic/fix-remove-artifact-ordering branch December 19, 2024 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/small Small and simple task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants