Skip to content

Conversation

thaJeztah
Copy link
Member

Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:

func BenchmarkSplit(b *testing.B) {
        b.ReportAllocs()
        data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
        for i := 0; i < b.N; i++ {
                for _, s := range data {
                        _ = strings.SplitN(s, "=", 2)[0]
                }
        }
}

func BenchmarkCut(b *testing.B) {
        b.ReportAllocs()
        data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
        for i := 0; i < b.N; i++ {
                for _, s := range data {
                        _, _, _ = strings.Cut(s, "=")
                }
        }
}
BenchmarkSplit
BenchmarkSplit-10            8244206               128.0 ns/op           128 B/op          4 allocs/op
BenchmarkCut
BenchmarkCut-10             54411998                21.80 ns/op            0 B/op          0 allocs/op

While looking at occurrences of strings.Split(), I also updated some for alternatives,
or added some constraints;

  • for cases where an specific number of items is expected, I used strings.SplitN()
    with a suitable limit. This prevents (theoretical) unlimited splits.
  • in some cases it we were using strings.Split(), but actually were trying to match
    a prefix; for those I replaced the code to just match (and/or strip) the prefix.

@thaJeztah
Copy link
Member Author

Oh! Interesting failure; either I broke something, or we previously didn't validate properly

=== RUN   TestHTTPSink
2022/11/03 11:59:35 http: TLS handshake error from 127.0.0.1:49560: remote error: tls: bad certificate
    http_test.go:57: error parsing status: strconv.Atoi: parsing "": invalid syntax
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=200, fail=false, error=false
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=200, fail=false, error=false
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=200, fail=false, error=false
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=200, fail=false, error=false
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=307, fail=false, error=false
    http_test.go:172: testcase: http://127.0.0.1:45429/?status=400, fail=true, error=false
    http_test.go:184: write error: httpSink{http://127.0.0.1:45429/?status=400}: response status 400 Bad Request unaccepted
    http_test.go:172: testcase: [http://127.0.0.1:42205?status=0](http://127.0.0.1:42205/?status=0), fail=false, error=true
    http_test.go:184: write error: httpSink{[http://127.0.0.1:42205?status=0](http://127.0.0.1:42205/?status=0)}: error posting: Post "[http://127.0.0.1:42205?status=0](http://127.0.0.1:42205/?status=0)": EOF

Comment on lines 34 to 39
if prior := r.Header.Get("X-Forwarded-For"); prior != "" {
proxies := strings.Split(prior, ",")
if len(proxies) > 0 {
remoteAddr := strings.Trim(proxies[0], " ")
if parseIP(remoteAddr) != nil {
return remoteAddr
}
remoteAddr, _, _ := strings.Cut(prior, ",")
remoteAddr = strings.Trim(remoteAddr, " ")
if parseIP(remoteAddr) != nil {
return remoteAddr
}
Copy link
Member Author

@thaJeztah thaJeztah Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first suspect for the failure may be related to this change, and I was actually looking at that;

  • Old code would see if X-Forwarded-For header is set (and non-empty)
  • But if there's no "comma" in the header, it would be just ignored
  • If there was a comma, we would pick the first value, but ignore the following ones

This felt a bit odd (but maybe there was a specific reason for this); why ignore the Header if there's only 1 "X-Forwarded-For"?

edit: format is: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-For

X-Forwarded-For: <client>, <proxy1>, <proxy2>

If a request goes through multiple proxies, the IP addresses of each successive proxy is listed. This means that, given well-behaved client and proxies, the rightmost IP address is the IP address of the most recent proxy and the leftmost IP address is the IP address of the originating client.

And the examples on that page (obviously, MDN is not the canonical source of truth) shows examples with only the <client> IP, so this should be valid?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if there's no "comma" in the header, it would be just ignored

Are you sure about this? Maybe I'm missing something but https://pkg.go.dev/strings#Split

if s does not contain sep and sep is not empty, Split returns a slice of length 1 whose only element is s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, 🙈 you're right; guess I had a mental "off by one" moment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the length check was at least redundant ☺️

@thaJeztah
Copy link
Member Author

🤦 that's not the failure; those error messages are "expected", and the test is actually passing 😂

--- PASS: TestHTTPSink (0.02s)

Actual failure is this one;

2022-11-03T11:59:37.7237953Z === RUN   TestParseRepositoryInfo
2022-11-03T11:59:37.7238900Z     normalize_test.go:279: Invalid normalized reference for "docker.io/library/foo/bar". Expected "library/foo/bar", got "foo/bar"
2022-11-03T11:59:37.7239474Z --- FAIL: TestParseRepositoryInfo (0.00s)

@thaJeztah
Copy link
Member Author

So, the assumption there is that the official images namespace (docker.io/library/) only allows for a single level to be underneath (docker.io/library/ubuntu:latest) and no nested namespaces (docker.io/library/distros/ubuntu:latest).

While that's the case currently, (Docker Hub doesn't use nested namespaces), I wonder if that assumption is correct. I don't think there's plans (currently) to do nested namespaces under library/ but technically Docker Hub could do this, and it would still be part of the library/ namespace.

I can fix the code, but perhaps I should add a TODO to discuss this further.

@thaJeztah thaJeztah force-pushed the strings_cut branch 2 times, most recently from 1561720 to 4a4d112 Compare November 4, 2022 22:50
@thaJeztah
Copy link
Member Author

@milosgajdos @corhere ptal (I realized this one also touches the reference package)

- use strings.HasPrefix() to check for the prefix we're interested in instead
  of doing a strings.Split() without limits. This makes the code both easier
  to read, and prevents potential situations where we end up with a long slice.
- use consts for defaults; these should never be modified, so better to use
  consts for them to indicate they're fixed values.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Copy link
Member

@milosgajdos milosgajdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL: strings.Cut. Never used it before, but seems using cleans up some code real nice 👍

@milosgajdos
Copy link
Member

@thaJeztah can you squash the commits?

@thaJeztah
Copy link
Member Author

Yes, it's a new feature and I like it as well; cleaner (in many cases), and more performant, which is a good combination.

can you squash the commits?

I tried to split the commits into logical units. The "reference" one I split separate (with in mind that we may be splitting that package to a separate repo, so would give a cleaner commit history after splitting)

context/http.go Outdated
}

switch parts[2] {
switch strings.TrimPrefix(keyStr, "http.request.") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already know (from line 194) that keyStr has "http.request." as its prefix, unconditionally slicing the string by a constant would be more optimal.

Suggested change
switch strings.TrimPrefix(keyStr, "http.request.") {
switch keyStr[len("http.request."):] {

The compiler is smart enough to evaluate len(constant) at compile time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing about slicing the strings like this always makes me feel uneasy because Go slices strings by bytes not by runes, so I'm happy with Trimming in this case because it avoids "bad habits"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I must admit I went for "readability" here.

Copy link
Member Author

@thaJeztah thaJeztah Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can completely drop the strings.HasPrefix(). strings.TrimPrefix() already handles that, and if there's no match in the switch, we'll continue with the fallback.

Changing that also makes the code easier to read, and we don't have to use the ugly goto fallback

diff --git a/context/http.go b/context/http.go
index 7471e572..7fb75790 100644
--- a/context/http.go
+++ b/context/http.go
@@ -191,10 +191,6 @@ func (ctx *httpRequestContext) Value(key interface{}) interface{} {
 			return ctx.r
 		}
 
-		if !strings.HasPrefix(keyStr, "http.request.") {
-			goto fallback
-		}
-
 		switch strings.TrimPrefix(keyStr, "http.request.") {
 		case "uri":
 			return ctx.r.RequestURI
@@ -220,10 +216,11 @@ func (ctx *httpRequestContext) Value(key interface{}) interface{} {
 			if ct != "" {
 				return ct
 			}
+		default:
+			// no match; fall back to standard behavior below
 		}
 	}
 
-fallback:
 	return ctx.Context.Value(key)
 }
 
@@ -296,10 +293,6 @@ func (irw *instrumentedResponseWriter) Value(key interface{}) interface{} {
 			return irw
 		}
 
-		if !strings.HasPrefix(keyStr, "http.response.") {
-			goto fallback
-		}
-
 		irw.mu.Lock()
 		defer irw.mu.Unlock()
 
@@ -313,9 +306,10 @@ func (irw *instrumentedResponseWriter) Value(key interface{}) interface{} {
 			if contentType != "" {
 				return contentType
 			}
+		default:
+			// no match; fall back to standard behavior below
 		}
 	}
 
-fallback:
 	return irw.Context.Value(key)
 }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can completely drop the strings.HasPrefix(). strings.TrimPrefix() already handles that, and if there's no match in the switch, we'll continue with the fallback.

Not really: strings.TrimPrefix returns the input string unchanged if there is no match, so the switch would match both keyStr == "uri" and "http.request.uri" if we drop the strings.HasPrefix() check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 yup

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use the full string in the switch (as we discussed) 👍

hosts := strings.SplitN(forwardedHost, ",", 2)
host = strings.TrimSpace(hosts[0])
host, _, _ = strings.Cut(forwardedHost, ",")
host = strings.TrimSpace(host)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: this should probably be changed to textproto.TrimString at some point, and the rest of the module audited for inappropriate uses of unicode functions in HTTP-related code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good one; TIL about that function (or at least, I don't recall I've used it before). Probably needs to be checked with punicode domains etc 🤔

minorPart := strings.Split(string(version), ".")[1]
_, minorPart, _ := strings.Cut(string(version), ".")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a functional change: Version("").minor() will return strconv.ErrSyntax instead of panicking. I just want to make sure you are aware.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check on that on in context (checking out the branch locally)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, caught up; yes, indeed, previously it would panic. I'm assuming "panic" is not very desirable, so would consider it a bug fix? 😅

@thaJeztah
Copy link
Member Author

@milosgajdos @corhere updated; PTAL (but pay attention to #3769 (comment)

Go 1.18 and up now provides a strings.Cut() which is better suited for
splitting key/value pairs (and similar constructs), and performs better:

```go
func BenchmarkSplit(b *testing.B) {
	b.ReportAllocs()
	data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
	for i := 0; i < b.N; i++ {
		for _, s := range data {
			_ = strings.SplitN(s, "=", 2)[0]
		}
	}
}

func BenchmarkCut(b *testing.B) {
	b.ReportAllocs()
	data := []string{"12hello=world", "12hello=", "12=hello", "12hello"}
	for i := 0; i < b.N; i++ {
		for _, s := range data {
			_, _, _ = strings.Cut(s, "=")
		}
	}
}
```

    BenchmarkSplit
    BenchmarkSplit-10    	 8244206	       128.0 ns/op	     128 B/op	       4 allocs/op
    BenchmarkCut
    BenchmarkCut-10      	54411998	        21.80 ns/op	       0 B/op	       0 allocs/op

While looking at occurrences of `strings.Split()`, I also updated some for alternatives,
or added some constraints;

- for cases where an specific number of items is expected, I used `strings.SplitN()`
  with a suitable limit. This prevents (theoretical) unlimited splits.
- in some cases it we were using `strings.Split()`, but _actually_ were trying to match
  a prefix; for those I replaced the code to just match (and/or strip) the prefix.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Minor optimization :)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Comment on lines +234 to +236
// TODO(thaJeztah): this considers "vars.FOO" and "FOO" to be equal.
// We need to check if that's intentional (could be a bug).
if v, ok := ctx.vars[strings.TrimPrefix(keyStr, "vars.")]; ok {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I suspect this may be a bug as well here (noticed this after your last comment); i.e. vars.FOO and FOO would be considered equal. This was already in place, and I'd have to check history, so I left this as a TODO for now.

Comment on lines +296 to 298
irw.mu.Lock()
defer irw.mu.Unlock()
return irw.status
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked a bit at the code, and the Mutex appeared to be to protect the status and written fields, so I moved the locks to those specific cases. This allowed us to have a simple switch with all options, and prevents acquiring a lock where not needed (which could always be a risk).

Copy link
Collaborator

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@thaJeztah
Copy link
Member Author

@milosgajdos still LGTY?

@milosgajdos milosgajdos merged commit e67e40b into distribution:main Nov 11, 2022
@thaJeztah thaJeztah deleted the strings_cut branch November 11, 2022 15:23
dylanrhysscott pushed a commit to digitalocean/docker-distribution that referenced this pull request Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants