replace strings.Split(N) for strings.Cut() or alternatives #3769

thaJeztah · 2022-11-03T11:58:28Z

Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:

func BenchmarkSplit(b *testing.B) {
        b.ReportAllocs()
        data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
        for i := 0; i < b.N; i++ {
                for _, s := range data {
                        _ = strings.SplitN(s, "=", 2)[0]
                }
        }
}

func BenchmarkCut(b *testing.B) {
        b.ReportAllocs()
        data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
        for i := 0; i < b.N; i++ {
                for _, s := range data {
                        _, _, _ = strings.Cut(s, "=")
                }
        }
}

BenchmarkSplit
BenchmarkSplit-10            8244206               128.0 ns/op           128 B/op          4 allocs/op
BenchmarkCut
BenchmarkCut-10             54411998                21.80 ns/op            0 B/op          0 allocs/op

While looking at occurrences of strings.Split(), I also updated some for alternatives,
or added some constraints;

for cases where an specific number of items is expected, I used strings.SplitN()
with a suitable limit. This prevents (theoretical) unlimited splits.
in some cases it we were using strings.Split(), but actually were trying to match
a prefix; for those I replaced the code to just match (and/or strip) the prefix.

thaJeztah · 2022-11-03T12:09:01Z

Oh! Interesting failure; either I broke something, or we previously didn't validate properly

=== RUN   TestHTTPSink
2022/11/03 11:59:35 http: TLS handshake error from 127.0.0.1:49560: remote error: tls: bad certificate
    http_test.go:57: error parsing status: strconv.Atoi: parsing "": invalid syntax
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=200, fail=false, error=false
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=200, fail=false, error=false
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=200, fail=false, error=false
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=200, fail=false, error=false
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=307, fail=false, error=false
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=400, fail=true, error=false
    http_test.go:184: write error: httpSink{http://127.0.0.1:45429/?status=400}: response status 400 Bad Request unaccepted
    http_test.go:172: testcase: [http://127.0.0.1:42205?status=0](http://127.0.0.1:42205/?status=0), fail=false, error=true
    http_test.go:184: write error: httpSink{[http://127.0.0.1:42205?status=0](http://127.0.0.1:42205/?status=0)}: error posting: Post "[http://127.0.0.1:42205?status=0](http://127.0.0.1:42205/?status=0)": EOF

thaJeztah · 2022-11-03T12:11:55Z

context/http.go

 	if prior := r.Header.Get("X-Forwarded-For"); prior != "" {
-		proxies := strings.Split(prior, ",")
-		if len(proxies) > 0 {
-			remoteAddr := strings.Trim(proxies[0], " ")
-			if parseIP(remoteAddr) != nil {
-				return remoteAddr
-			}
+		remoteAddr, _, _ := strings.Cut(prior, ",")
+		remoteAddr = strings.Trim(remoteAddr, " ")
+		if parseIP(remoteAddr) != nil {
+			return remoteAddr
 		}


My first suspect for the failure may be related to this change, and I was actually looking at that;

Old code would see if X-Forwarded-For header is set (and non-empty)

But if there's no "comma" in the header, it would be just ignored

If there was a comma, we would pick the first value, but ignore the following ones

This felt a bit odd (but maybe there was a specific reason for this); why ignore the Header if there's only 1 "X-Forwarded-For"?

edit: format is: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-For

X-Forwarded-For: <client>, <proxy1>, <proxy2>

If a request goes through multiple proxies, the IP addresses of each successive proxy is listed. This means that, given well-behaved client and proxies, the rightmost IP address is the IP address of the most recent proxy and the leftmost IP address is the IP address of the originating client.

And the examples on that page (obviously, MDN is not the canonical source of truth) shows examples with only the <client> IP, so this should be valid?

But if there's no "comma" in the header, it would be just ignored

Are you sure about this? Maybe I'm missing something but https://pkg.go.dev/strings#Split

if s does not contain sep and sep is not empty, Split returns a slice of length 1 whose only element is s.

Oh, 🙈 you're right; guess I had a mental "off by one" moment?

so the length check was at least redundant ☺️

thaJeztah · 2022-11-03T12:25:48Z

🤦 that's not the failure; those error messages are "expected", and the test is actually passing 😂

--- PASS: TestHTTPSink (0.02s)

Actual failure is this one;

2022-11-03T11:59:37.7237953Z === RUN   TestParseRepositoryInfo
2022-11-03T11:59:37.7238900Z     normalize_test.go:279: Invalid normalized reference for "docker.io/library/foo/bar". Expected "library/foo/bar", got "foo/bar"
2022-11-03T11:59:37.7239474Z --- FAIL: TestParseRepositoryInfo (0.00s)

thaJeztah · 2022-11-03T12:32:50Z

So, the assumption there is that the official images namespace (docker.io/library/) only allows for a single level to be underneath (docker.io/library/ubuntu:latest) and no nested namespaces (docker.io/library/distros/ubuntu:latest).

While that's the case currently, (Docker Hub doesn't use nested namespaces), I wonder if that assumption is correct. I don't think there's plans (currently) to do nested namespaces under library/ but technically Docker Hub could do this, and it would still be part of the library/ namespace.

I can fix the code, but perhaps I should add a TODO to discuss this further.

thaJeztah · 2022-11-08T22:36:22Z

@milosgajdos @corhere ptal (I realized this one also touches the reference package)

- use strings.HasPrefix() to check for the prefix we're interested in instead of doing a strings.Split() without limits. This makes the code both easier to read, and prevents potential situations where we end up with a long slice. - use consts for defaults; these should never be modified, so better to use consts for them to indicate they're fixed values. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

milosgajdos

TIL: strings.Cut. Never used it before, but seems using cleans up some code real nice 👍

milosgajdos · 2022-11-09T20:33:58Z

@thaJeztah can you squash the commits?

thaJeztah · 2022-11-09T20:42:06Z

Yes, it's a new feature and I like it as well; cleaner (in many cases), and more performant, which is a good combination.

can you squash the commits?

I tried to split the commits into logical units. The "reference" one I split separate (with in mind that we may be splitting that package to a separate repo, so would give a cleaner commit history after splitting)

corhere · 2022-11-09T23:04:29Z

context/http.go

-		}
-
-		switch parts[2] {
+		switch strings.TrimPrefix(keyStr, "http.request.") {


Since we already know (from line 194) that keyStr has "http.request." as its prefix, unconditionally slicing the string by a constant would be more optimal.

Suggested change

switch strings.TrimPrefix(keyStr, "http.request.") {

switch keyStr[len("http.request."):] {

The compiler is smart enough to evaluate len(constant) at compile time.

the thing about slicing the strings like this always makes me feel uneasy because Go slices strings by bytes not by runes, so I'm happy with Trimming in this case because it avoids "bad habits"

Yes, I must admit I went for "readability" here.

Actually, we can completely drop the strings.HasPrefix(). strings.TrimPrefix() already handles that, and if there's no match in the switch, we'll continue with the fallback.

Changing that also makes the code easier to read, and we don't have to use the ugly goto fallback

diff --git a/context/http.go b/context/http.go index 7471e572..7fb75790 100644 --- a/context/http.go +++ b/context/http.go @@ -191,10 +191,6 @@ func (ctx *httpRequestContext) Value(key interface{}) interface{} { return ctx.r } - if !strings.HasPrefix(keyStr, "http.request.") { - goto fallback - } - switch strings.TrimPrefix(keyStr, "http.request.") { case "uri": return ctx.r.RequestURI @@ -220,10 +216,11 @@ func (ctx *httpRequestContext) Value(key interface{}) interface{} { if ct != "" { return ct } + default: + // no match; fall back to standard behavior below } } -fallback: return ctx.Context.Value(key) } @@ -296,10 +293,6 @@ func (irw *instrumentedResponseWriter) Value(key interface{}) interface{} { return irw } - if !strings.HasPrefix(keyStr, "http.response.") { - goto fallback - } - irw.mu.Lock() defer irw.mu.Unlock() @@ -313,9 +306,10 @@ func (irw *instrumentedResponseWriter) Value(key interface{}) interface{} { if contentType != "" { return contentType } + default: + // no match; fall back to standard behavior below } } -fallback: return irw.Context.Value(key) }

Actually, we can completely drop the strings.HasPrefix(). strings.TrimPrefix() already handles that, and if there's no match in the switch, we'll continue with the fallback.

Not really: strings.TrimPrefix returns the input string unchanged if there is no match, so the switch would match both keyStr == "uri" and "http.request.uri" if we drop the strings.HasPrefix() check.

Updated to use the full string in the switch (as we discussed) 👍

context/http.go

corhere · 2022-11-09T23:24:50Z

registry/api/v2/urls.go

-			hosts := strings.SplitN(forwardedHost, ",", 2)
-			host = strings.TrimSpace(hosts[0])
+			host, _, _ = strings.Cut(forwardedHost, ",")
+			host = strings.TrimSpace(host)


Aside: this should probably be changed to textproto.TrimString at some point, and the rest of the module audited for inappropriate uses of unicode functions in HTTP-related code.

Good one; TIL about that function (or at least, I don't recall I've used it before). Probably needs to be checked with punicode domains etc 🤔

corhere · 2022-11-09T23:32:59Z

configuration/parser.go

-	minorPart := strings.Split(string(version), ".")[1]
+	_, minorPart, _ := strings.Cut(string(version), ".")


This is technically a functional change: Version("").minor() will return strconv.ErrSyntax instead of panicking. I just want to make sure you are aware.

Let me check on that on in context (checking out the branch locally)

Ah, caught up; yes, indeed, previously it would panic. I'm assuming "panic" is not very desirable, so would consider it a bug fix? 😅

context/http.go

thaJeztah · 2022-11-10T15:20:09Z

@milosgajdos @corhere updated; PTAL (but pay attention to #3769 (comment)

Go 1.18 and up now provides a strings.Cut() which is better suited for splitting key/value pairs (and similar constructs), and performs better: ```go func BenchmarkSplit(b *testing.B) { b.ReportAllocs() data := []string{"12hello=world", "12hello=", "12=hello", "12hello"} for i := 0; i < b.N; i++ { for _, s := range data { _ = strings.SplitN(s, "=", 2)[0] } } } func BenchmarkCut(b *testing.B) { b.ReportAllocs() data := []string{"12hello=world", "12hello=", "12=hello", "12hello"} for i := 0; i < b.N; i++ { for _, s := range data { _, _, _ = strings.Cut(s, "=") } } } ``` BenchmarkSplit BenchmarkSplit-10 8244206 128.0 ns/op 128 B/op 4 allocs/op BenchmarkCut BenchmarkCut-10 54411998 21.80 ns/op 0 B/op 0 allocs/op While looking at occurrences of `strings.Split()`, I also updated some for alternatives, or added some constraints; - for cases where an specific number of items is expected, I used `strings.SplitN()` with a suitable limit. This prevents (theoretical) unlimited splits. - in some cases it we were using `strings.Split()`, but _actually_ were trying to match a prefix; for those I replaced the code to just match (and/or strip) the prefix. Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

Minor optimization :) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

thaJeztah · 2022-11-10T21:39:50Z

context/http.go

+		// TODO(thaJeztah): this considers "vars.FOO" and "FOO" to be equal.
+		// We need to check if that's intentional (could be a bug).
+		if v, ok := ctx.vars[strings.TrimPrefix(keyStr, "vars.")]; ok {


So, I suspect this may be a bug as well here (noticed this after your last comment); i.e. vars.FOO and FOO would be considered equal. This was already in place, and I'd have to check history, so I left this as a TODO for now.

thaJeztah · 2022-11-10T21:43:17Z

context/http.go

+			irw.mu.Lock()
+			defer irw.mu.Unlock()
 			return irw.status


Looked a bit at the code, and the Mutex appeared to be to protect the status and written fields, so I moved the locks to those specific cases. This allowed us to have a simple switch with all options, and prevents acquiring a lock where not needed (which could always be a risk).

corhere

LGTM!

thaJeztah · 2022-11-10T23:50:33Z

@milosgajdos still LGTY?

thaJeztah commented Nov 3, 2022

View reviewed changes

thaJeztah force-pushed the strings_cut branch 2 times, most recently from 1561720 to 4a4d112 Compare November 4, 2022 22:50

thaJeztah mentioned this pull request Nov 9, 2022

reference: use consts for defaults #3782

Closed

thaJeztah force-pushed the strings_cut branch from 4a4d112 to 2b34b7e Compare November 9, 2022 14:23

milosgajdos approved these changes Nov 9, 2022

View reviewed changes

milosgajdos requested review from wy65701436 and deleteriousEffect November 9, 2022 20:44

corhere approved these changes Nov 9, 2022

View reviewed changes

thaJeztah force-pushed the strings_cut branch from 2b34b7e to 7179b4e Compare November 10, 2022 15:15

thaJeztah commented Nov 10, 2022

View reviewed changes

context/http.go Outdated Show resolved Hide resolved

thaJeztah force-pushed the strings_cut branch from 7179b4e to d23ed8a Compare November 10, 2022 21:33

thaJeztah added 2 commits November 10, 2022 22:38

cloudfront: use strings.Equalfold()

842d4c0

Minor optimization :) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

thaJeztah force-pushed the strings_cut branch from d23ed8a to 842d4c0 Compare November 10, 2022 21:38

thaJeztah commented Nov 10, 2022

View reviewed changes

corhere approved these changes Nov 10, 2022

View reviewed changes

milosgajdos merged commit e67e40b into distribution:main Nov 11, 2022

thaJeztah deleted the strings_cut branch November 11, 2022 15:23

thaJeztah mentioned this pull request Dec 3, 2022

[release/2.8] bump up golang version #3812

Closed

dylanrhysscott pushed a commit to digitalocean/docker-distribution that referenced this pull request Jan 5, 2023

Fixed spelling of 'exammple' (distribution#3769)

c9b69ce

	switch strings.TrimPrefix(keyStr, "http.request.") {
	switch keyStr[len("http.request."):] {

		minorPart := strings.Split(string(version), ".")[1]
		_, minorPart, _ := strings.Cut(string(version), ".")

replace strings.Split(N) for strings.Cut() or alternatives #3769

replace strings.Split(N) for strings.Cut() or alternatives #3769

Uh oh!

Conversation

thaJeztah commented Nov 3, 2022

Uh oh!

thaJeztah commented Nov 3, 2022

Uh oh!

thaJeztah Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thaJeztah commented Nov 3, 2022

Uh oh!

thaJeztah commented Nov 3, 2022

Uh oh!

thaJeztah commented Nov 8, 2022

Uh oh!

milosgajdos left a comment

Choose a reason for hiding this comment

Uh oh!

milosgajdos commented Nov 9, 2022

Uh oh!

thaJeztah commented Nov 9, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thaJeztah Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thaJeztah commented Nov 10, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

corhere left a comment

Choose a reason for hiding this comment

Uh oh!

thaJeztah commented Nov 10, 2022

Uh oh!

Uh oh!

thaJeztah Nov 3, 2022 •

edited

Loading

thaJeztah Nov 10, 2022 •

edited

Loading