-
Notifications
You must be signed in to change notification settings - Fork 3.6k
support gzip and zstd HTTP transport compression to fetch remote resources #7563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @ndeloof. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
39cd85b
to
808db06
Compare
@@ -152,6 +154,7 @@ func (r dockerFetcher) Fetch(ctx context.Context, desc ocispec.Descriptor) (io.R | |||
|
|||
func (r dockerFetcher) open(ctx context.Context, req *request, mediatype string, offset int64) (_ io.ReadCloser, retErr error) { | |||
req.header.Set("Accept", strings.Join([]string{mediatype, `*/*`}, ", ")) | |||
req.header.Set("Accept-Encoding", "zstd, gzip") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to;
- specify weight when using multiple options? (e.g. prefer
zstd
overgzip
)? (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding) - add
deflate
as well? - not sure if
*
is "best practice" to add (maybe it is, also because that effectively was the "previous")
zstd;q=1.0, gzip;q=0.8, deflate;q=0.5, *;q=0.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that effectively was the "previous"
nope, net/http automatically set Accept-Encoding: gzip
if you don't set this header on your own
About weight, I don't think fetcher has any preference? Registry probably has, because it might already host/cache pre-compressed entities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, net/http automatically set Accept-Encoding: gzip if you don't set this header on your own
Ah, thanks!
About weight, I don't think fetcher has any preference? Registry probably has, because it might already host/cache pre-compressed entities
Yes, so for these
- the Client sends a list of what it accepts, and a preferred order (so in this case, may be
zstd
) - if the server supports multiple of them, it will then pick the one that the client indicated to be the preferred one.
I think if no weights are set, that it's mostly "undefined" (so we may get gzip
or we may get zstd
, even if the registry would support both); https://developer.mozilla.org/en-US/docs/Glossary/Quality_values
Quality values, or q-values and q-factors, are used to describe the order of priority of values in a comma-separated list
...
If there is no priority defined for the first two values, the order in the list is irrelevant.
remotes/docker/fetcher.go
Outdated
default: | ||
return nil, errors.New("unsupported Content-Encoding: " + encoding) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like Content-Encoding
can return multiple values (comma-separated) if multiple are applied; https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding
So we may want to either "log" this was a warning, and "assume" the old behavior, or handle those (not exactly sure how to handle "multiple")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiple is indeed possible according to the HTTP RFC, but I can't imagine a practical usage for it. This could be implemented by applying algorithms in reverse order, but do we need such complexity for a pure theoretical corner case? Then returning error here should be enough, or would you like an additional log for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't know how common it is; perhaps others know? Wondering if (e.g.) a CDN could apply compression to already compressed streams, in which case perhaps multiple compressions happen.
Thought I'd at least call it out so that it doesn't go overlooked. We could add a comment describing that that's something that's possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added support for multiple encoding by e4cd281
remotes/docker/fetcher.go
Outdated
encoding := strings.Split(resp.Header.Get("Content-Encoding"), ", ") | ||
for i := len(encoding) - 1; i >= 0; i-- { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these should be treated case-insensitive, and whitespace to be trimmed; if the list has multiple items, then empty values in the list should be ignored (although that seems like a really corner-case).
https://www.rfc-editor.org/rfc/rfc9110.html#section-8.4.1
All content codings are case-insensitive and ought to be registered within the
"HTTP Content Coding Registry", as described in Section 16.6
Whitespace is not part of the value (so should be trimmerd); https://www.rfc-editor.org/rfc/rfc9110.html#name-field-values
A field value does not include leading or trailing whitespace. When a specific
version of HTTP allows such whitespace to appear in a message, a field parsing
implementation MUST exclude such whitespace prior to evaluating the field value.
Empty values inside the list should be ignored; https://www.rfc-editor.org/rfc/rfc9110.html#section-5.6.1.2-6
Empty elements do not contribute to the count of elements present. A recipient
MUST parse and ignore a reasonable number of empty list elements: enough to
handle common mistakes by senders that merge values, but not so much that they
could be used as a denial-of-service mechanism.
Could use strings.FieldFunc()
to split it; https://go.dev/play/p/Z8ed_yjGQS-
package main
import (
"fmt"
"strings"
)
func main() {
values := strings.FieldsFunc(" hello, world,and,more,words ", func(r rune) bool {
return r == ' ' || r == ','
})
fmt.Println(strings.Join(values, "|"))
}
if err != nil { | ||
return nil, err | ||
} | ||
body = r.IOReadCloser() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to break the loop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, when response was encoded using multiple algorithm, we need to apply them all, in reverse order
Probably never happens, but as suggested by @thaJeztah, better strictly conform with HTTP RFC to avoid future issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Yes, I think it's good to at least have the code in this PR.
Whether or not all things should be enabled by default, or if this should be a configurable option (which compressions to support), can still be discussed
e4cd281
to
7726bda
Compare
/ok-to-test |
…urces Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
c2c30ef
to
6326db0
Compare
return resp.Body, nil | ||
body := resp.Body | ||
encoding := strings.FieldsFunc(resp.Header.Get("Content-Encoding"), func(r rune) bool { | ||
return unicode.IsSpace(r) || r == ',' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return unicode.IsSpace(r) || r == ',' | |
return r == '' " || r == ' \t' || r == ',' |
To avoid matching U+3000, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ndeloof Can you take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not actively working on this topic, feel free to adopt/revisit/reimplement this PR
Are there known public registries that support zstd already? |
@kzys AWS uses zstd for fargate. Registry has a PR open for assing support for zstd |
Golang
net/http
already transparently supports gzip compression.This adds support for zstd compression