-
Notifications
You must be signed in to change notification settings - Fork 198
feat!(publish): add retry logic to publish operations #4064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
✅ Deploy Preview for zarf-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! Some design consideration and nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small items. Also before merging this I would like to ask the user if they still run into this when --oci-concurrency=1
. I'll make a comment in the issue
User feedback is always good. I'd probably wager we're solving for a separate constraint here than what oci-concurrency offers. |
I think there's a fair likelihood it's related in a majority network flakes. I know that when we switched to oras from crane we enabled the zarf/src/internal/packager/images/push.go Line 151 in 8067c8b
There are differences between the usual flows though so oci-concurrency could be unrelated. Usually image pushes will go through a Kubernetes port-forward whereas usually package publishes will not. The registry pod also occasionally is rolled so multiple attempts by default is especially important in that case. |
I agree! I interpret this request as more of a roboustness feature as opposed to single-root-cause. Zarf should expect to run in network constrained environments. Having the ability to retry operations that rely on the network should likely be a pattern we replicate more broadly. |
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
@brandtkeller That's fair, and I might be too stringent on the request that we must know what the issue is in the user case, but I do think it helps inform how we design the fallback mechanism. I'm a bit torn on if this should be a feature, but I do see the argument, especially if there is only one attempt by default |
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments based on the UX for this
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Signed-off-by: Brandt Keller <brandt.keller@defenseunicorns.com> Co-authored-by: Maciej Szulik <maciej@defenseunicorns.com> Signed-off-by: Cade Thomas <cadethomas23@gmail.com>
Description
Adding support for retry logic to the package publish operations (package, skeleton, oci). This allows for more robust/resilient handling of publish operations - that may take time on larger packages - to allow retrying.
Proposing the retry logic live within the
zoci
pkg.For local package/skeleton - opting to include all remote interactions during the
PushPackage()
logic as it makes the operation more resilient to network flakiness - something that may very well be a consistent Zarf constraint.For OCI to OCI copy - updating the logic to utilize
oras
directly and create logic that mirrors thePush
operation. Willing to discuss this further and revert if we see a reason.Operations that push an artifact to a remote registry should return the descriptor to enable follow-on operations.
Related Issue
Fixes #3787
Checklist before merging