pushWriter: correctly propagate errors #7985

jedevc · 2023-01-23T15:12:07Z

🐛 Fixes #7972

In the refactor from #6995, the error handling was substantially reworked, and changed the types of errors returned - this caused issues in retry logic downstream in BuildKit (see docker/build-push-action#761). BuildKit uses this error result from this function to determine whether to retry the push or not.

Notably, in the case of a network error, instead of propagating the error through to return from pushWriter.Write (as previously), it would be propagated through to pushWriter.Commit - however, this is too late, since we've already closed the io.Pipe by the time we would have reached this function. Therefore, we get the generic error message "io: read/write on closed pipe" for every network error - this seems to be the issue in #7972, likely there is some other underlying network error, but it is not shown.

This patch corrects this behavior to ensure that the correct error object is always returned as early as possible:

Track the corresponding io.PipeReader for the pushWriter -- on any error, we CloseWithError the Reader, to ensure that the next Write or Commit to the Writer returns the desired error. This applies for both network errors from request.doWithRetries, as well as for the ErrResets.
Remove the error channel - we don't need this any more, errors are instead clearly tracked through the CloseWithError function.
Always reset the content, even if the offset is already 0. The previous patch could fallthrough out of the switch statement, and attempt to access a nil response object.

I've also slightly refactored the test to work with the new symbols, and to explicitly test with the commit function, since it took me a while to manually verify that I hadn't broken the logic of the test.

Finally, I've verified that this fix works with BuildKit:

The BuildKit vendoring commit: moby/buildkit@master...jedevc:buildkit:containerd-fix-push-error-propogate (using the commit cherry-picked to v1.6.15, which is what BuildKit uses)
The green CI run! https://github.com/jedevc/buildkit-actions-testing/actions/runs/3987862482/jobs/6838242268 - Compare this to the previous flaky CI runs https://github.com/jedevc/buildkit-actions-testing/actions/runs/3985045036

I'm not 100% confident in this code, so hopefully some experienced maintainers can weigh in ❤️. I think while the changes are complex and difficult to verify, I think they do need to be cherry-picked across since the previous changes break heavily on the v1.6 branch

k8s-ci-robot · 2023-01-23T15:12:18Z

Hi @jedevc. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

akhilerm · 2023-01-23T15:54:07Z

/cc @dmcgowan

Also, can we get an ok-to-test on this PR

estesp · 2023-01-23T16:11:36Z

/ok-to-test

remotes/docker/pusher.go

dmcgowan · 2023-01-23T21:09:53Z

Could this change be simplified by checking the error channel on Write? The errors don't propagate synchronously either way since the request is done in a go routine. Using the error channel, then you don't need to change the side which closes with error since the Write operation can fail before that point. The current code does a setError followed by a close, so it is possible to check for an error on Write when a closed pipe error is hit.

jedevc · 2023-01-23T22:44:11Z

Could this change be simplified by checking the error channel on Write?

Oh good call 🎉 The logic that prevented this working properly is the behavior that called .Close after .setError - if we don't call .Close on a network error, then we can still propagate correctly.

I think that means we don't even need to call CloseWithError? Will take a closer look.

dmcgowan · 2023-01-23T22:52:52Z

if we don't call .Close on a network error, then we can still propagate correctly

I think .Close still needs to be called to prevent the pipe write from hanging after a request error. Since we are wrapping the reader in NopCloser, the writer should close explicitly after request returns error. Think it is more about using the right error, closed pipe is almost always the case where another error more meaningful occurred.

jedevc · 2023-01-23T22:57:40Z

Wouldn't we then still have a race where the pipe close error could be returned then?

Write checks error, no error found
Network error occurs, sent to error channel (since error channel is non-blocking), pipe closed
Write calls pipe.Write, gets io.PipeClosedError

The only way to avoid this is if we can avoid the channel entirely, and directly CloseWithError the pipe - and I'm not sure what the error channel is doing then.

Edit: sorry I think I misread. After getting a PipeClosedError from writing to the pipe, we should pull from the error channel - then if we have a "better" error from there, return that, otherwise, return the PipeClosedError.

In the refactor from 926b9c7, the error handling was substantially reworked, and changed the types of errors returned. Notably, in the case of a network error, instead of propogating the error through to return from pushWriter.Write (as previously), it would be propagated through to pushWriter.Commit - however, this is too late, since we've already closed the io.Pipe by the time we would have reached this function. Therefore, we get the generic error message "io: read/write on closed pipe" for *every network error*. This patch corrects this behavior to ensure that the correct error object is always returned as early as possible, by checking the error result after writing and detecting a closed pipe. Additionally, we do some additional hardening - specifically we prevent falling through when resetting the content or detecting errors, and update the tests to explicitly check for the ErrReset message. Signed-off-by: Justin Chadwell <me@jedevc.com>

jedevc · 2023-01-24T11:41:52Z

@dmcgowan have reworked the patch as you suggested, it's a lot simpler.

I also tested this with BuildKit again, and yup, this does appear to fix our downstream issue still.

estesp

LGTM

jedevc · 2023-01-24T18:47:23Z

Awesome 🎉 should I open a separate cherry-pick PR?

shawaj · 2023-01-24T18:51:00Z

Thanks to you all for such quick resolution 🥳

…/main Update fork-external/main with upstream containerd/containerd/main at commit hash 3d32da8 Related work items: containerd#5674, containerd#7129, containerd#7393, containerd#7661, containerd#7685, containerd#7810, containerd#7850, containerd#7861, containerd#7882, containerd#7883, containerd#7886, containerd#7891, containerd#7892, containerd#7893, containerd#7903, containerd#7904, containerd#7905, containerd#7906, containerd#7907, containerd#7908, containerd#7911, containerd#7913, containerd#7914, containerd#7917, containerd#7925, containerd#7927, containerd#7928, containerd#7929, containerd#7932, containerd#7935, containerd#7943, containerd#7946, containerd#7948, containerd#7957, containerd#7958, containerd#7959, containerd#7960, containerd#7963, containerd#7968, containerd#7969, containerd#7970, containerd#7973, containerd#7985, containerd#7987, containerd#7994, containerd#8005

k8s-ci-robot added the needs-ok-to-test label Jan 23, 2023

jedevc force-pushed the fix-push-error-propagate branch from 9e3e6ad to 197a317 Compare January 23, 2023 15:14

k8s-ci-robot requested a review from dmcgowan January 23, 2023 15:54

k8s-ci-robot added ok-to-test and removed needs-ok-to-test labels Jan 23, 2023

dmcgowan reviewed Jan 23, 2023

View reviewed changes

remotes/docker/pusher.go Outdated Show resolved Hide resolved

dmcgowan reviewed Jan 23, 2023

View reviewed changes

remotes/docker/pusher.go Outdated Show resolved Hide resolved

AkihiroSuda added the cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch label Jan 24, 2023

jedevc force-pushed the fix-push-error-propagate branch from 197a317 to 9f6058d Compare January 24, 2023 11:38

estesp approved these changes Jan 24, 2023

View reviewed changes

dmcgowan approved these changes Jan 24, 2023

View reviewed changes

dmcgowan merged commit c873647 into containerd:main Jan 24, 2023

thaJeztah mentioned this pull request Jan 24, 2023

[release/1.6 backport] pushWriter: correctly propagate errors #7990

Merged

thaJeztah added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch and removed cherry-pick/1.6.x Change to be cherry picked to release/1.6 branch labels Jan 24, 2023

karl-johan-grahn mentioned this pull request Jan 24, 2023

404 page stakater/stakater-docs#111

Merged

thaJeztah mentioned this pull request Jan 24, 2023

[release/1.5 backport] pushWriter: correctly propagate errors #7998

Merged

thaJeztah added the cherry-picked/1.5.x PR commits are cherry-picked into release/1.5 branch label Jan 24, 2023

shivamkm07 mentioned this pull request Jan 25, 2023

Dapr workflow fails due to docker buildx failure dapr/dapr#5805

Closed

crazy-max mentioned this pull request Apr 9, 2023

imagetools create should propagate the original error docker/buildx#1726

Closed

jedevc mentioned this pull request Apr 13, 2023

Fix various timing issues with docker pusher #8379

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pushWriter: correctly propagate errors #7985

pushWriter: correctly propagate errors #7985

Uh oh!

jedevc commented Jan 23, 2023 •

edited

Loading

Uh oh!

k8s-ci-robot commented Jan 23, 2023

Uh oh!

akhilerm commented Jan 23, 2023

Uh oh!

estesp commented Jan 23, 2023

Uh oh!

Uh oh!

Uh oh!

dmcgowan commented Jan 23, 2023

Uh oh!

jedevc commented Jan 23, 2023

Uh oh!

dmcgowan commented Jan 23, 2023

Uh oh!

jedevc commented Jan 23, 2023 •

edited

Loading

Uh oh!

jedevc commented Jan 24, 2023

Uh oh!

estesp left a comment

Uh oh!

jedevc commented Jan 24, 2023

Uh oh!

shawaj commented Jan 24, 2023

Uh oh!

Uh oh!

pushWriter: correctly propagate errors #7985

pushWriter: correctly propagate errors #7985

Uh oh!

Conversation

jedevc commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Jan 23, 2023

Uh oh!

akhilerm commented Jan 23, 2023

Uh oh!

estesp commented Jan 23, 2023

Uh oh!

Uh oh!

Uh oh!

dmcgowan commented Jan 23, 2023

Uh oh!

jedevc commented Jan 23, 2023

Uh oh!

dmcgowan commented Jan 23, 2023

Uh oh!

jedevc commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jedevc commented Jan 24, 2023

Uh oh!

estesp left a comment

Choose a reason for hiding this comment

Uh oh!

jedevc commented Jan 24, 2023

Uh oh!

shawaj commented Jan 24, 2023

Uh oh!

Uh oh!

jedevc commented Jan 23, 2023 •

edited

Loading

jedevc commented Jan 23, 2023 •

edited

Loading