-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Add support for utf8 names on /v1/label/:name/values endpoint #15399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
461b63f
to
609a49b
Compare
windows error seems to be spurious |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM!
Should we do it in release-3.0
branch first (if yes, rebase on that branch), so we do 3.0.1? cc @jan--f
We need to put cap on how much we put into 3.0.x here. We will likely have multiple UTF-8 gaps.. I don't think it must go to 3.0.1, but I think it could (:
@jan--f said to commit to main and then backport as necessary. I can set up that PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good code-wise, but shouldn't there be an update to the documentation in https://github.com/prometheus/prometheus/blob/main/docs/querying/api.md ?
90b9b4a
to
23517a2
Compare
23517a2
to
919cb80
Compare
FTR: I'm not sure if this should really be in v3.0.1. I would see it more as a missing feature than a real bug. (And I would like to get back to a regular release cadence ASAP, so that v3.1.0 is not far down the road.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for more comments. Documenting is hard…
In any case, before merging this, let's decide if this should go into v3.0.1 or v3.1.0, see other comment. (If need be, we need to rebase this on top of the release branch.)
docs/querying/api.md
Outdated
The `data` section of the JSON response is a list of string label values. | ||
|
||
This example queries for all label values for the `job` label: | ||
This example queries for all label values for the `job_name` label: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for nitpicking again, but let's not use a "weird" label name in the first example. job_name
is a weird because it's so common to have a job
label. Why would somebody have job_name
? If your intention is to have a label name with an underscore here, let' use one that is actually common like http_status_code
(which could then be http.status_code
in the example below to even allude to the different meaning of .
and _
in OTel).
docs/querying/api.md
Outdated
For label names that contain characters that are not valid in the legacy | ||
Prometheus character set (letters, numbers, underscores, and colons), they |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, you can use non-legacy characters here as long as they are valid as URL components, can't we?
So I guess http://localhost:9090/api/v1/label/job.name/values
would actually work?
If that's true, we only need the U__
encoding for characters that don't work as URL components (in particular /
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about that, but that might lead people to use HTML encoding: "job%20name" which is not supported. So while job.name would work, I'd prefer to make the reasoning simpler and just ask that people use encoding when the name is non-legacy-compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By HTML encoding, you mean percent encoding, also known as URL encoding, don't you? I.e. https://en.wikipedia.org/wiki/Percent-encoding
I would expect that URL encoding is in fact supported, because that should just be dealt with by the Go HTTP libraries and the Prometheus code wouldn't even notice.
It might very well possible that we only need the U__
encoding for a /
in the label name. (At least that was the case long ago for label values as they show up in URLs for pushing to the PGW. Of course, back then there was no U__
encoding, so we elected for base64.)
Sorry for beating this point for so long, but it feels confusing to refer to the legacy Prometheus character set here when there is no technical reason for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that my assumptions above are all correct, I would use the following flow of explanation:
- Don't mention URL encoding at all, because it is (IMHO) implied that that always works (it's part of the HTTP standard, and tools like
curl
and browsers will use it implicitly anyway). - Just state that label names starting with
U__
are interpreted as encoded according to the Values Escaping method, which can be used at will, but also is the only way to specify label names that contain a/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have rewritten it this way.
docs/querying/api.md
Outdated
|
||
For label names that contain characters that are not valid in the legacy | ||
Prometheus character set (letters, numbers, underscores, and colons), they | ||
should be escaped using the Values Escaping method: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's handy to have a short description here, do we have a comprehensive explanation of this encoding method that we can link to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the prometheus proposal is the closest thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@beorn7 I can file an issue for the doc issue, is that enough to unblock this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking to the Prometheus proposal is fine. I.e. link to https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping (as we did for PGW, just checked that out). No issue required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Like beorn7 mentioned let's target |
Previously, the api was evaluating this regex to determine if the label name was valid or not: https://github.com/prometheus/common/blob/14bac55a992f7b83ab9d147a041e274606bdb607/model/labels.go#L94 However, I believe that the `IsValid()` function is what ought to be used in the brave new utf8 era. **Before** ``` $ curl localhost:9090/api/v1/label/host.name/values {"status":"error","errorType":"bad_data","error":"invalid label name: \"host.name\""} ``` **After** ``` $ curl localhost:9090/api/v1/label/host.name/values {"status":"success","data":["localhost"]} ``` It's very likely that I'm missing something here or you were already planning to do this at some point but I just encountered this issue and figured I'd give it a go. Signed-off-by: Owen Williams <owen.williams@grafana.com>
919cb80
to
528ecf6
Compare
rebased onto release-3.0 |
25085ee
to
07b9c49
Compare
This should not block 3.0.1, but I believe it's ready to go in (pending getting the documentation right!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually good to have this in v3.0.1 because it is technically a breaking change (legacy label names could have the form U__...
). I don't think it's relevant enough to call it out in the migration guide, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a follow up, but shouldn't block this fix.
Signed-off-by: Owen Williams <owen.williams@grafana.com>
07b9c49
to
12577e3
Compare
Follow-on from #14943
Previously, the api was evaluating this regex to determine if the label
name was valid or not:
https://github.com/prometheus/common/blob/14bac55a992f7b83ab9d147a041e274606bdb607/model/labels.go#L94
However, I believe that the
IsValid()
function is what ought to beused in the brave new utf8 era.
Before
After
It's very likely that I'm missing something here or you were already
planning to do this at some point but I just encountered this issue and
figured I'd give it a go.