Skip to content

Conversation

ndeloof
Copy link
Contributor

@ndeloof ndeloof commented Oct 19, 2022

Golang net/http already transparently supports gzip compression.
This adds support for zstd compression

@k8s-ci-robot
Copy link

Hi @ndeloof. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -152,6 +154,7 @@ func (r dockerFetcher) Fetch(ctx context.Context, desc ocispec.Descriptor) (io.R

func (r dockerFetcher) open(ctx context.Context, req *request, mediatype string, offset int64) (_ io.ReadCloser, retErr error) {
req.header.Set("Accept", strings.Join([]string{mediatype, `*/*`}, ", "))
req.header.Set("Accept-Encoding", "zstd, gzip")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to;

zstd;q=1.0, gzip;q=0.8, deflate;q=0.5, *;q=0.1

Copy link
Contributor Author

@ndeloof ndeloof Oct 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that effectively was the "previous"

nope, net/http automatically set Accept-Encoding: gzip if you don't set this header on your own

About weight, I don't think fetcher has any preference? Registry probably has, because it might already host/cache pre-compressed entities

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, net/http automatically set Accept-Encoding: gzip if you don't set this header on your own

Ah, thanks!

About weight, I don't think fetcher has any preference? Registry probably has, because it might already host/cache pre-compressed entities

Yes, so for these

  • the Client sends a list of what it accepts, and a preferred order (so in this case, may be zstd )
  • if the server supports multiple of them, it will then pick the one that the client indicated to be the preferred one.

I think if no weights are set, that it's mostly "undefined" (so we may get gzip or we may get zstd, even if the registry would support both); https://developer.mozilla.org/en-US/docs/Glossary/Quality_values

Quality values, or q-values and q-factors, are used to describe the order of priority of values in a comma-separated list
...
If there is no priority defined for the first two values, the order in the list is irrelevant.

Comment on lines 232 to 233
default:
return nil, errors.New("unsupported Content-Encoding: " + encoding)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Content-Encoding can return multiple values (comma-separated) if multiple are applied; https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding

So we may want to either "log" this was a warning, and "assume" the old behavior, or handle those (not exactly sure how to handle "multiple")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple is indeed possible according to the HTTP RFC, but I can't imagine a practical usage for it. This could be implemented by applying algorithms in reverse order, but do we need such complexity for a pure theoretical corner case? Then returning error here should be enough, or would you like an additional log for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't know how common it is; perhaps others know? Wondering if (e.g.) a CDN could apply compression to already compressed streams, in which case perhaps multiple compressions happen.

Thought I'd at least call it out so that it doesn't go overlooked. We could add a comment describing that that's something that's possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added support for multiple encoding by e4cd281

Comment on lines 218 to 321
encoding := strings.Split(resp.Header.Get("Content-Encoding"), ", ")
for i := len(encoding) - 1; i >= 0; i-- {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these should be treated case-insensitive, and whitespace to be trimmed; if the list has multiple items, then empty values in the list should be ignored (although that seems like a really corner-case).

https://www.rfc-editor.org/rfc/rfc9110.html#section-8.4.1

All content codings are case-insensitive and ought to be registered within the
"HTTP Content Coding Registry", as described in Section 16.6

Whitespace is not part of the value (so should be trimmerd); https://www.rfc-editor.org/rfc/rfc9110.html#name-field-values

A field value does not include leading or trailing whitespace. When a specific
version of HTTP allows such whitespace to appear in a message, a field parsing
implementation MUST exclude such whitespace prior to evaluating the field value.

Empty values inside the list should be ignored; https://www.rfc-editor.org/rfc/rfc9110.html#section-5.6.1.2-6

Empty elements do not contribute to the count of elements present. A recipient
MUST parse and ignore a reasonable number of empty list elements: enough to
handle common mistakes by senders that merge values, but not so much that they
could be used as a denial-of-service mechanism.

Could use strings.FieldFunc() to split it; https://go.dev/play/p/Z8ed_yjGQS-

package main

import (
	"fmt"
	"strings"
)

func main() {
	values := strings.FieldsFunc("    hello,   world,and,more,words ", func(r rune) bool {
		return r == ' ' || r == ','
	})

	fmt.Println(strings.Join(values, "|"))
}

if err != nil {
return nil, err
}
body = r.IOReadCloser()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to break the loop here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, when response was encoded using multiple algorithm, we need to apply them all, in reverse order
Probably never happens, but as suggested by @thaJeztah, better strictly conform with HTTP RFC to avoid future issues

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes, I think it's good to at least have the code in this PR.

Whether or not all things should be enabled by default, or if this should be a configurable option (which compressions to support), can still be discussed

@samuelkarp
Copy link
Member

/ok-to-test

…urces

Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
@ndeloof ndeloof force-pushed the accept-encoding branch 4 times, most recently from c2c30ef to 6326db0 Compare November 10, 2022 14:42
return resp.Body, nil
body := resp.Body
encoding := strings.FieldsFunc(resp.Header.Get("Content-Encoding"), func(r rune) bool {
return unicode.IsSpace(r) || r == ','
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return unicode.IsSpace(r) || r == ','
return r == '' " || r == ' \t' || r == ','

To avoid matching U+3000, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndeloof Can you take a look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not actively working on this topic, feel free to adopt/revisit/reimplement this PR

@kzys
Copy link
Member

kzys commented Feb 15, 2023

Are there known public registries that support zstd already?

@gaby
Copy link

gaby commented Apr 18, 2023

@kzys AWS uses zstd for fargate. Registry has a PR open for assing support for zstd

@ndeloof ndeloof closed this May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants