-
Notifications
You must be signed in to change notification settings - Fork 491
Add UTF-8 support in metric and label names #689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Federico Torres <federico.torres@grafana.com>
Signed-off-by: Federico Torres <federico.torres@grafana.com>
Sorry, still fighting my backlog. I hope I get to review this tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this.
A few ideas about how to precisely control whether to escape or not. I hope I got it right, but happy to learn where I'm wrong.
In different news, this needs documentation (which currently is just in the ever growing README.md – feel free to add it there, breaking down the documentation into "proper" documentation files would be a task for another PR).
handler/handler_test.go
Outdated
@@ -18,6 +18,7 @@ import ( | |||
"context" | |||
"errors" | |||
"fmt" | |||
"github.com/prometheus/common/model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to go into the next block (line 29).
main.go
Outdated
@@ -17,6 +17,7 @@ import ( | |||
"compress/gzip" | |||
"context" | |||
"fmt" | |||
"github.com/prometheus/common/model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, has to go to line 39 or so.
main.go
Outdated
@@ -73,6 +74,7 @@ func main() { | |||
persistenceFile = app.Flag("persistence.file", "File to persist metrics. If empty, metrics are only kept in memory.").Default("").String() | |||
persistenceInterval = app.Flag("persistence.interval", "The minimum interval at which to write out the persistence file.").Default("5m").Duration() | |||
pushUnchecked = app.Flag("push.disable-consistency-check", "Do not check consistency of pushed metrics. DANGEROUS.").Default("false").Bool() | |||
pushEscaped = app.Flag("push.enable-escaped-labels", "Allow escaped characters in label names in URLs.").Default("false").Bool() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this push.enable-utf8-names
to mirror the feature flag in Prometheus, and also to correctly describe what we are doing. (Without the flag, the PGW will reject non-legacy characters in names even in the payload of a push.)
Variable name and doc string has to be adjusted accordingly.
handler/push.go
Outdated
@@ -180,19 +180,21 @@ func splitLabels(labels string) (map[string]string, error) { | |||
for i := 0; i < len(components)-1; i += 2 { | |||
name, value := components[i], components[i+1] | |||
trimmedName := strings.TrimSuffix(name, Base64Suffix) | |||
unescapedName := model.UnescapeName(trimmedName, model.ValueEncodingEscaping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't even want to try to unescape if the corresponding flag is not set. To not refer to the flag value directly here, maybe set a package-level variable of type model.EscapingScheme
to either model.ValueEncodingEscaping
or model.NoEscaping
and then use that variable here.
handler/push.go
Outdated
if !model.LabelNameRE.MatchString(trimmedName) || | ||
!model.LabelName(unescapedName).IsValid() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this validity test, wo don't need the one in the previous line anymore, do wo?
ab2490d
to
83c45eb
Compare
Signed-off-by: Federico Torres <federico.torres@grafana.com>
@beorn7 Thanks for your feedback, the requested changes were addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good (modulu doc comment on push.EscapingScheme
. But we also need documentation. (As suggested previously, I believe it is OK for now to add a section to the ever growing README.md, keeping in mind that eventually we would like to break it up into "proper" documentation.)
handler/push.go
Outdated
@@ -41,6 +41,8 @@ const ( | |||
Base64Suffix = "@base64" | |||
) | |||
|
|||
var EscapingScheme = model.NoEscaping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could benefit from a doc comment.
Signed-off-by: Federico Torres <federico.torres@grafana.com>
@beorn7 Sorry about the documentation, thanks for pointing it out, now it's added. Also added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I just couldn't resist and applied a lot of wordsmithing. Please see my suggestions.
README.md
Outdated
@@ -320,6 +320,28 @@ Examples: | |||
|
|||
/metrics/job/titan/name@base64/zqDPgc6_zrzOt864zrXPjc-C | |||
|
|||
### UTF-8 support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be a bit more specific here and call this "UTF-8 support for metric and label names".
(UTF-8 is supported for label values already.)
README.md
Outdated
UTF-8 characters in metric and label names are supported by enabling | ||
the `--push.enable-utf8-names` flag. The syntax is as follows: | ||
|
||
{"some.metric", "foo.bar"="baz"} | ||
|
||
For UTF-8 label names defined in the URL path, [an escaping syntax](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping) is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not explain the syntax of the text exposition format here. (It also doesn't cover protobuf.)
With some wordsmithing included, I would propose the following:
UTF-8 characters in metric and label names are supported by enabling | |
the `--push.enable-utf8-names` flag. The syntax is as follows: | |
{"some.metric", "foo.bar"="baz"} | |
For UTF-8 label names defined in the URL path, [an escaping syntax](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping) is used. | |
Newer versions of the Prometheus exposition formats (text and protobuf) | |
support the full UTF-8 character set in metric and label names. The | |
Pushgateway only accepts special characters in names if the command line | |
flag `--push.enable-utf8-names` is set. | |
To allow special characters in label names that are part of the URL path, the flag also enables a | |
[specific encoding mechanism](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping). This is similar to the base64 encoding for label _values_ described above, | |
but works differently in detail for technical and historical reasons. As before, client libraries | |
should usually take care of the encoding. It works as follows: |
README.md
Outdated
* Prefix the string with `U__`. | ||
* All non-valid characters (i.e. characters other than letters, numbers, and underscores) will be encoded as underscores surrounding the Unicode value, like `_1F60A_`. | ||
* All pre-existing underscores will become doubled: `__`. | ||
* If a string should start with "U__" already, it will need to be escaped: `U___55_____`. (That's `U__` + `_55_` (for `U`) + `__` + `__`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stay consistent with regard to the form. You start with an imperative form and then use future tense.
Here my suggestion with a few things added:
* Prefix the string with `U__`. | |
* All non-valid characters (i.e. characters other than letters, numbers, and underscores) will be encoded as underscores surrounding the Unicode value, like `_1F60A_`. | |
* All pre-existing underscores will become doubled: `__`. | |
* If a string should start with "U__" already, it will need to be escaped: `U___55_____`. (That's `U__` + `_55_` (for `U`) + `__` + `__`). | |
* A label name containing encoded characters starts with `U__`. | |
* All non-standard characters (i.e. characters other than letters, numbers, and underscores) are encoded as underscores surrounding their Unicode value, like `_1F60A_`. | |
* All pre-existing underscores are encoded as a double-underscore: `__`. | |
* If a label name starts with `U__` already, these characters have to be encoded as well, resulting in `U___55_____`. (That's `U__` + `_55_` (for `U`) + `__` + `__`). | |
* A label name starting with `U__` in it's encoded form, but containing invalid sequences (e.g. `U__in_xxx_valid`) is left unchanged. |
README.md
Outdated
* All pre-existing underscores will become doubled: `__`. | ||
* If a string should start with "U__" already, it will need to be escaped: `U___55_____`. (That's `U__` + `_55_` (for `U`) + `__` + `__`). | ||
|
||
For example, the label `"foo.bar"="baz"` would be escaped like: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's better use "encoded" rather than "escaped" to only use one word throughout this file.
README.md
Outdated
|
||
/metrics/job/example/U__foo_2e_bar/baz | ||
|
||
This escaping is compatible with the base64 encoding for label values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
escaping → encoding, see above.
This escaping is compatible with the base64 encoding for label values: | ||
|
||
/metrics/job/example/U__foo_2e_bar@base64/YmF6 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add something about the one edge case, e.g.
Note that this method has an unlikely edge case that is not handled properly: A pusher unaware of the encoding mechanism might use a label name that is also a valid encoded version of another label name. E.g. if a pusher intends to use the label name U__foo_2e_bar
, but doesn't encode it as U___55_____foo__2e__bar
, the Pushgateway will decode U__foo_2e_bar
to foo.bar
. This is the main reason why the decoding is opt-in via the --push.enable-utf8-names
flag.
Signed-off-by: Federico Torres <federico.torres@grafana.com>
@beorn7 Thanks for the docs suggestions, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
Adds UTF-8 support for metric and label names. These changes are based on the work done on the Prometheus common libraries [here](prometheus/common#537) and [here](prometheus/common#570) - The `prometheus-metrics-exposition-formats` module will use the new quoting syntax `{"foo"}` iff the metric does not conform to the legacy name format (`foo{}`) - The `prometheus-metrics-model` module has a new flag (`nameValidationScheme`) that determines if validation is done using the legacy or the UTF-8 scheme. This flag can be set via a property in the properties file. - Scrapers can announce via content negotiation that they support UTF-8 names by adding `escaping=allow-utf-8` in the Accept header. In cases where UTF-8 is not available, metric providers can be configured to escape names in a few different ways: values (`U__` UTF value escaping for perfect round-tripping), underscores (all invalid chars become `_`), dots (dots become `_dot_`, `_` becomes `__`, all other values become `___`). Escaping has a global default (`PrometheusNaming.DEFAULT_ESCAPING_SCHEME`) or can also be specified in Accept header with the `escaping=` term, which can be `allow-utf-8` (for UTF-8-compatible), `underscores`, `dots`, or `values`. This should still be a noop for existing configurations because scrapers will not be passing the escaping key in the Accept header. Existing functionality is maintained. - The `prometheus-metrics-exporter-pushgateway` module will [escape](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping) UTF-8 grouping keys in the URL path used when pushing metrics (see prometheus/pushgateway#689) Work towards prometheus/prometheus#13095 --------- Signed-off-by: Federico Torres <federico.torres@grafana.com> Signed-off-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com> Co-authored-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
As part of the initiative for adding UTF-8 support to Prometheus, this PR brings UTF-8 support for metric and label names as described in this proposal (for example:
{"metric.name", "label.name"="value"}
.For UTF-8 label names defined in the URL path, an escaping syntax is used. For example, the label
"label.name"="value"
would be escaped like:/metrics/job/example/U__label_2e_name/value
. This escaping is compatible with the base64 encoding for label values:/metrics/job/example/U__label_2e_name@base64/dmFsdWU=
UTF-8 support can be enabled by running the Pushgateway with the flag
--push.enable-utf8-names=true
Addresses #623