-
Notifications
You must be signed in to change notification settings - Fork 3.7k
pushWriter: correctly propagate errors #7985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @jedevc. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
9e3e6ad
to
197a317
Compare
/cc @dmcgowan Also, can we get an ok-to-test on this PR |
/ok-to-test |
Could this change be simplified by checking the error channel on Write? The errors don't propagate synchronously either way since the request is done in a go routine. Using the error channel, then you don't need to change the side which closes with error since the |
Oh good call 🎉 The logic that prevented this working properly is the behavior that called I think that means we don't even need to call |
I think |
Wouldn't we then still have a race where the pipe close error could be returned then?
The only way to avoid this is if we can avoid the channel entirely, and directly Edit: sorry I think I misread. After getting a |
In the refactor from 926b9c7, the error handling was substantially reworked, and changed the types of errors returned. Notably, in the case of a network error, instead of propogating the error through to return from pushWriter.Write (as previously), it would be propagated through to pushWriter.Commit - however, this is too late, since we've already closed the io.Pipe by the time we would have reached this function. Therefore, we get the generic error message "io: read/write on closed pipe" for *every network error*. This patch corrects this behavior to ensure that the correct error object is always returned as early as possible, by checking the error result after writing and detecting a closed pipe. Additionally, we do some additional hardening - specifically we prevent falling through when resetting the content or detecting errors, and update the tests to explicitly check for the ErrReset message. Signed-off-by: Justin Chadwell <me@jedevc.com>
197a317
to
9f6058d
Compare
@dmcgowan have reworked the patch as you suggested, it's a lot simpler. I also tested this with BuildKit again, and yup, this does appear to fix our downstream issue still. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Awesome 🎉 should I open a separate cherry-pick PR? |
Thanks to you all for such quick resolution 🥳 |
…/main Update fork-external/main with upstream containerd/containerd/main at commit hash 3d32da8 Related work items: containerd#5674, containerd#7129, containerd#7393, containerd#7661, containerd#7685, containerd#7810, containerd#7850, containerd#7861, containerd#7882, containerd#7883, containerd#7886, containerd#7891, containerd#7892, containerd#7893, containerd#7903, containerd#7904, containerd#7905, containerd#7906, containerd#7907, containerd#7908, containerd#7911, containerd#7913, containerd#7914, containerd#7917, containerd#7925, containerd#7927, containerd#7928, containerd#7929, containerd#7932, containerd#7935, containerd#7943, containerd#7946, containerd#7948, containerd#7957, containerd#7958, containerd#7959, containerd#7960, containerd#7963, containerd#7968, containerd#7969, containerd#7970, containerd#7973, containerd#7985, containerd#7987, containerd#7994, containerd#8005
🐛 Fixes #7972
In the refactor from #6995, the error handling was substantially reworked, and changed the types of errors returned - this caused issues in retry logic downstream in BuildKit (see docker/build-push-action#761). BuildKit uses this error result from this function to determine whether to retry the push or not.
Notably, in the case of a network error, instead of propagating the error through to return from
pushWriter.Write
(as previously), it would be propagated through topushWriter.Commit
- however, this is too late, since we've already closed theio.Pipe
by the time we would have reached this function. Therefore, we get the generic error message"io: read/write on closed pipe"
for every network error - this seems to be the issue in #7972, likely there is some other underlying network error, but it is not shown.This patch corrects this behavior to ensure that the correct error object is always returned as early as possible:
io.PipeReader
for thepushWriter
-- on any error, weCloseWithError
theReader
, to ensure that the nextWrite
orCommit
to theWriter
returns the desired error. This applies for both network errors fromrequest.doWithRetries
, as well as for theErrReset
s.CloseWithError
function.I've also slightly refactored the test to work with the new symbols, and to explicitly test with the commit function, since it took me a while to manually verify that I hadn't broken the logic of the test.
Finally, I've verified that this fix works with BuildKit:
I'm not 100% confident in this code, so hopefully some experienced maintainers can weigh in ❤️. I think while the changes are complex and difficult to verify, I think they do need to be cherry-picked across since the previous changes break heavily on the v1.6 branch