Skip to content

scrape: make le and quantile normalization configurable #12984

@beorn7

Description

@beorn7

Proposal

This is a sister proposal for #12934. It's dealing with the same problem, but in a different way, which is not necessarily mutually exclusive, but if we implement the one, the need for the other might diminish.

Problem

The le label of a classic histogram and the quantile label of a summary contain values that are representing float numbers (although they are, as it is always the case, technically strings). While PromQL sometimes parses them back as floats (namely in the histogram_quantile function), it generally treats them as strings and matches them on strings. This creates problems because one and the same float number can be represented as many different strings (see #12934 for more details). Here are the practically relevant issues:

  • In Prom 1.x, all these label values were normalized (whether they came in via text format or protobuf format), using the usual Go formatting (%g). While I assume this can still yield unstable results in specific cases with many decimal places, this generally worked fine.
  • Prom 2.x stopped normalizing these label values, and also dropped protobuf scraping. All these label values were taken "as is", i.e. as exposed by the instrumented target, thereby effectively importing the formatting convention of the language the target was written in. This created confusion when moving from Prom1 to Prom2 because histograms and summaries coming from Java or Python suddenly showed up differently in Prometheus.
  • OpenMetrics tried to unify the formatting. While not ruling out all ambiguities for floats with many decimal places (see above), it mostly settled on a Python like formatting. This created confusion for Go binaries when switching them to OM.
  • Prom 2.40 brought us native histograms and – as a byproduct – resurrected the protobuf scraping. Protobuf has to normalize because the le and quantile values are actual float numbers in the format, rather than strings. To stay consistent with OM, the new protobuf parser uses OM-style formatting. This created confusion when you have Go binaries that were not yet using the OM mode of prometheus/client_golang (which is a huge number, e.g. all of K8s, Prometheus itself and many other binaries from the Prometheus ecosystem, …) and switch to native histograms. The intent was to scrape classic histograms as before, but now all the le labels are OM formatted.

At the dev summit in September 2023 we decided to always normalize the values of le and quantile labels, presumably in OM style. While that will unify things for the future, we still have the problem at hand that a migration from classic to native histograms cannot be really smooth because the le labels will change once you set the native histogram feature flag.

Proposal

In short, make the normalization of the le and quantile label values configurable in the scrape config.

The setting needs a good name. normalize_histogram_and_quantile_labels is probably too long, but you know what I mean…

This new scrape config setting should apply to scrapes in all formats, thereby fulfilling the dev-summit feature demand. However, by keeping things configurable, users can migrate (or not migrate) from one format to another under their own control.

The settings could be classic to normalize in the "classic" Prometheus style, i.e. vanilla Go %g formatting, and openmetrics to normalize in OpenMetrics style, and none to take the values as they are from the text format.

The default value of the setting would be to none for now, i.e. keep everything as is. In Prom 3.x, we could set the default to openmetrics.

There is a little wrinkle for protobuf because there is no none. Protobuf always normalizes for the reasons explained above. So the default behavior with protobuf format would effectively be openmetrics (if we keep things as they are now). However, the setting would still help, because you could set it to classic for Go targets to ease the migration. (Since we cannot know what the exposition would have looked like in text format if we scrape with protobuf, there is no easy way to automatically "do the right thing".)

Alternatives

We could also provide a relabel action to normalize labels in one way or the other. It could also be applied to other labels that happen to be numbers. While the relabel action could also allow for smooth migration, it would be very tedious to manually identify all classic histograms and summaries and relabel them. Arguably, a scrape config and a relabel action could both exist for different use cases.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions