Skip to content

Conversation

ywwg
Copy link
Member

@ywwg ywwg commented Jan 18, 2024

This PR builds on #537 by adding content negotation during scraping.

Scrapers can announce that they support utf-8 names by adding "escaping=allow-utf-8" in the Content-Type header. In cases where utf8 is not available, metric providers can be configured to escape names in a few different ways: values (U__ utf value escaping for perfect round-tripping), underscores (all invalid chars become _), dots (dots become _dot_, _ becomes __, all other values become __). Escaping can either be a global default (model.NameEscapingScheme) or can also be specific in Content-Type with the "escaping=" term, which can be "allow-utf-8" (for UTF8-compatible), "underscores", "dots", or "values".

This is still a noop for existing configurations because scrapers will not be passing the escaping key in the Content-Type header. Existing functionality is maintained.

@ywwg ywwg force-pushed the owilliams/quoted-metric-name-02 branch from eef9af4 to f4d7f87 Compare January 23, 2024 15:59
@ywwg ywwg marked this pull request as ready for review January 23, 2024 16:34
@ywwg
Copy link
Member Author

ywwg commented Jan 23, 2024

This will be squashed :)

expfmt/encode.go Outdated
}
}
return FmtText
return FmtText_0_0_4 + Format(fmt.Sprintf("; escaping=%s", model.EscapeValues))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this specifies a default escaping scheme when none other is specified. not sure if this is the right thing to do

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General direction looks OK. I will only manage to do a code level review next week.

Some broad thoughts:

First the versioning. See the individual comment below. It feels weird to pretend to support a version that doesn't exist yet. (But as said, I'm not an expert and I don't know how this is usually done.)

Another gut feeling: validchars and escaping never show up together. Maybe they can be merged into one option? escaping=none could take the role of validchars=utf8, and escaping=values is the default. (Maybe that default should be part of the (upcoming) standard and hardwired in the code. It feels weird that the default is set via a package variable and that would change the behavior for the same accept header.)

expfmt/expfmt.go Outdated
OpenMetricsType = `application/openmetrics-text`
OpenMetricsVersion_0_0_1 = "0.0.1"
OpenMetricsVersion_2_0_0 = "2.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for the classic text format, anything goes because it hasn't really been formally specified, but here I would definitively add a comment that this doesn't imply that we are really implementing OM v2 from the future.

On a more general note, I'm wondering if pretending we are supporting a version of the standard that doesn't exist yet is the right way to go. I'm not an expert in those content-type shenanigans. Maybe we should add some "alpha" or "pre" in here?

@beorn7
Copy link
Member

beorn7 commented Jan 25, 2024

Another thought: Maybe we don't need to change version numbers at all right now. Any escaping that is not none will produce valid classic text format or OMv1. So we could just document the escaping=none (currently validchars=utf8, see comment above) as an experimental feature that creates a slightly different format, that will be part of a future standard. We can later decide if we want to keep supporting the "illegal" combination of OMv1 and escaping=none, but that's still better than asking clients to negotiate OMv2 although OMv2 will certainly look much differently once it really exists.

@ywwg
Copy link
Member Author

ywwg commented Jan 26, 2024

I like the idea of using escaping to also select for utf8 mode. maybe instead of "none" we call it escaping=utf8-ok or something like that, to make it more explicit?

@ywwg
Copy link
Member Author

ywwg commented Jan 26, 2024

can't remember if dashes are allowed in content-type parameters though

@beorn7
Copy link
Member

beorn7 commented Feb 1, 2024

Yes, let's do that. I.e. only one parameter escaping that defines the behavior (missing = old behavior, any escaping schema = do the escaping, allow-utf-8 or something = use the new format).

I would also say, for now, that we don't change the version numbers but "tolerate" that the client can ask for allow-utf-8 and we pragmatically create an exposition format that is technically invalid. Just add a comment in the code that says that this is just for experimentation and that there will be new versions of the expositions formats eventually that properly describe the new syntax and should be selected whenever UTF-8 names are requested.

And yes, dashes are allowed in the content type, we use it for charset=utf-8 and encoding=compact-text.

The former reminds me that we should try to be consistent and spell UTF-8 as UTF-8 where possible.

@beorn7
Copy link
Member

beorn7 commented Feb 1, 2024

The Go test errors seem to have to do with Go modules. Probably need to run go mod tidy or something.

Could you fix that and work in the ideas above? I think the code otherwise is technically fine. I'll do a final review once that's done.

@ywwg
Copy link
Member Author

ywwg commented Feb 5, 2024

notes should all be addressed

@ywwg
Copy link
Member Author

ywwg commented Feb 5, 2024

oh except for adding comments

Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I assume you will squash the commits yourself and then take care of DCO signing.

Signed-off-by: Owen Williams <owen.williams@grafana.com>
@ywwg ywwg force-pushed the owilliams/quoted-metric-name-02 branch from 78b2c9a to 9b1ede5 Compare February 6, 2024 16:10
@ywwg
Copy link
Member Author

ywwg commented Feb 6, 2024

squashed

@ywwg ywwg force-pushed the owilliams/quoted-metric-name-02 branch from 9b1ede5 to 319c62c Compare February 6, 2024 16:11
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much.

@beorn7 beorn7 merged commit 773d566 into prometheus:main Feb 6, 2024
zeitlinger added a commit to prometheus/client_java that referenced this pull request Aug 18, 2025
Adds UTF-8 support for metric and label names.

These changes are based on the work done on the Prometheus common
libraries [here](prometheus/common#537) and
[here](prometheus/common#570)

- The `prometheus-metrics-exposition-formats` module will use the new
quoting syntax `{"foo"}` iff the metric does not conform to the legacy
name format (`foo{}`)
- The `prometheus-metrics-model` module has a new flag
(`nameValidationScheme`) that determines if validation is done using the
legacy or the UTF-8 scheme. This flag can be set via a property in the
properties file.
- Scrapers can announce via content negotiation that they support UTF-8
names by adding `escaping=allow-utf-8` in the Accept header. In cases
where UTF-8 is not available, metric providers can be configured to
escape names in a few different ways: values (`U__` UTF value escaping
for perfect round-tripping), underscores (all invalid chars become `_`),
dots (dots become `_dot_`, `_` becomes `__`, all other values become
`___`). Escaping has a global default
(`PrometheusNaming.DEFAULT_ESCAPING_SCHEME`) or can also be specified in
Accept header with the `escaping=` term, which can be `allow-utf-8` (for
UTF-8-compatible), `underscores`, `dots`, or `values`.
This should still be a noop for existing configurations because scrapers
will not be passing the escaping key in the Accept header. Existing
functionality is maintained.
- The `prometheus-metrics-exporter-pushgateway` module will
[escape](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping)
UTF-8 grouping keys in the URL path used when pushing metrics (see
prometheus/pushgateway#689)

Work towards prometheus/prometheus#13095

---------

Signed-off-by: Federico Torres <federico.torres@grafana.com>
Signed-off-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
Co-authored-by: Gregor Zeitlinger <gregor.zeitlinger@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants