Skip to content

Conversation

twaugh
Copy link
Contributor

@twaugh twaugh commented Jul 27, 2016

When fetching layer content, we want to ensure that we get back the content as-is with no modification in order to verify its checksum.

Go's http.Transport provides transparent decompression by adding 'x-gzip' to the list of accepted encodings, then decompressing it before returning it to the caller.

This can be turned off using the DisableCompression flag.

Doing so fixes #17595.

To verify this, store V2 API responses on e.g. an Apache httpd web server with default configuration, and try 'docker pull' from it.

@GordonTheTurtle GordonTheTurtle added status/0-triage dco/no Automatically set by a bot when one of the commits lacks proper signature labels Jul 27, 2016
@twaugh twaugh force-pushed the 17595-no-transparent-decompression branch from ad0bd0d to fefc19a Compare July 27, 2016 11:12
@GordonTheTurtle GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Jul 27, 2016
When fetching layer content, we want to ensure that we get back the
content as-is with no modification in order to verify its checksum.

Go's http.Transport provides transparent decompression by adding
'x-gzip' to the list of accepted encodings, then decompressing it
before returning it to the caller.

This can be turned off using the DisableCompression flag.

Doing so fixes moby#17595.

Signed-off-by: Tim Waugh <twaugh@redhat.com>
@twaugh twaugh force-pushed the 17595-no-transparent-decompression branch from fefc19a to 71976af Compare July 27, 2016 11:21
@thaJeztah
Copy link
Member

thanks!

ping @aaronlehmann @stevvooe ptal

@aaronlehmann
Copy link
Contributor

Took a look at the linked issue to refresh my memory. In that issue, I said:

I think it's fundamentally wrong for the HTTP server to be adding this header. It tells the client that the HTTP stream is compressed. In this case, it's not the HTTP stream that's compressed, but the actual payload. The distinction matters a lot.

But I also said:

It seems like turning on DisableCompression wouldn't hurt. Feel free to submit a PR to do this.

So I guess this looks good to me :)

The tradeoff is that this flag will disable compression in a few cases where it's not harmful, like transparent compression of manifests. I don't think there are any situations where this makes a significant difference, so it seems okay to disable compression since it works around a fairly common misconfiguration.

I'd like to hear what @stevvooe and @dmcgowan think, though.

@stevvooe
Copy link
Contributor

Apache HTTPd's behavior here is broken and simply incorrect. It assumes that the client will immediately decompress compressed data, which isn't always true, especially in the case of layers.

Disabling inline transport compression for everyone because of Apache's configuration isn't right. Note that this can be disabled in the Apache configuration.

If we must, we can probably hide this behind an environment variable.

@cgwalters
Copy link
Contributor

But since the layers are already compressed, it's not useful for the webserver to try to compress them again regardless, right?

@stevvooe
Copy link
Contributor

But since the layers are already compressed, it's not useful for the webserver to try to compress them again regardless, right?

No, it is not. It can be useful to compress manifests but I'm not sure about the affect on performance.

My big worry in doing this is that in the future we do need compression but enabling will break misconfigured proxies. Should we take on technical debt due to an inflexible proxy configuration?

Apache's behavior is tampering with content by instructing the client to decompress before it can be verified. I can see if this was from a file on disk, but this is a response, sent over a proxy connection.

It makes total sense to compress uncompressed content inline but detecting compressed content and then adding a header pretending that transport compression is in use is simply wrong. If the upstream meant to use transport compression, it would set the header itself. The feature should be disabled when proxying registry responses.

Is there a header we can add to registry responses to disable this behavior in Apache?

@RalphCorderoy
Copy link

(AIUI if an explicit Accept-Encoding header is added by net/http's user then net/http doesn't add its own one requesting gzip, and it also stops its unwanted-initiative decompression. If it would be useful, this means the client could still get the benefit of compression if there are any non-compressed resources by adding its own "Accept-Encoding: gzip" rather than setting DisableCompression.)

@stevvooe
Copy link
Contributor

@RalphCorderoy Thanks for the pointer!

That led me to find Accept-Encoding: identity.

@twaugh @cgwalters Could we set this on requests for layers? Will Apache honor this?

@cpuguy83
Copy link
Member

cpuguy83 commented Sep 8, 2016

ping @twaugh @cgwalters

@thaJeztah
Copy link
Member

I'll close this for lack of activity, but happy to reopen if you have time to work on it again

@thaJeztah thaJeztah closed this Oct 21, 2016
@stevvooe
Copy link
Contributor

@thaJeztah Need to file an issue to set Accept: identity on requests for layers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docker doesn't understand 'x-gzip' encoding and thus fails to pull images
8 participants