Skip to content

Conversation

ywwg
Copy link
Member

@ywwg ywwg commented Feb 21, 2025

The new metric_name_escaping_scheme config option works in parallel with metric_name_validation_scheme and controls which escaping scheme is requested when scraping. When not specified, the scheme will request underscores if the validation scheme is set to legacy, and will request allow-utf-8 when the validation scheme is set to utf8. This setting allows users to allow utf8 names if they like, but explicitly request an escaping scheme rather than UTF-8.

Fixes #16034

Built on #16080

@ywwg ywwg force-pushed the owilliams/escapeconfig branch from 2420a0b to 73ae88d Compare February 21, 2025 18:27
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some questions and thoughts to progress here. It makes sense to add override for escape, but not sure of the choice of defaults here. Again, for the record, I am not sure anyone would like their metrics to be automatically changed, any time, especially dynamically, in a hidden way on SDK level. Some expert eyes on this like @fstab would be amazing.

I noticed Prometheus now always have model.NameValidationScheme set to UTF-8 AFAIK; why we keep validating this?

As per other changes -- before this change (and in 3.x?) we used to have:

  • config MetricNameValidationScheme is empty by default
  • If config MetricNameValidationScheme == legacy:
  • Else
    • We do UTF-8 validation
    • We send escape=AllowUTF-8 on all scrape protocols

Now:

  • Since model.NameValidationScheme is always UTF-8, the default means UTF-8 for config MetricNameValidationScheme, same as now too.
  • if config MetricNameValidationScheme is set to legacy we do underscore escaping (breaking, we used to do no escaping header).
  • else no escaping so `escaping="Allow-UTF" as it was

Then users can override escaping..

Is that right?

On a quick glance, should we:

  • Kill all validation or mention,defaulting to different model.NameValidationScheme. It should be always UTF-8 now? We can set explicit global in main.go too?

Then, it's convoluted but:

  • Do we have to break API and do underscore escaping by default on legacy validation? Can we just fail? (it's what we do now?)
  • We add AllowUTF-8 escaping for UTF-8 validation (so always, by default). Why? What it changes for SDKs actually? (ignorant question likely, sorry, trying to understand).

@ywwg
Copy link
Member Author

ywwg commented Feb 24, 2025

  • if config MetricNameValidationScheme is set to legacy we do underscore escaping (breaking, we used to do no escaping header).

this is not technically breaking since sending no escaping header asks the common library to apply its default escaping method, which is hard-coded to underscores right now. Hypothetically someone could override that global in common, but I do not think anyone has ever done that.

@ywwg
Copy link
Member Author

ywwg commented Feb 24, 2025

  • Kill all validation or mention,defaulting to different model.NameValidationScheme. It should be always UTF-8 now? We can set explicit global in main.go too?

I think there's a good argument for this. Even with the default ValidationScheme set to UTF-8, it is possible to configure a datasource for legacy validation.

@ywwg
Copy link
Member Author

ywwg commented Feb 24, 2025

  • We add AllowUTF-8 escaping for UTF-8 validation (so always, by default). Why? What it changes for SDKs actually? (ignorant question likely, sorry, trying to understand).

I'm not quite sure I understand the question. "allow-utf-8" is a content negotiation header, which is part of Prometheus itself, not part of the SDK. The Common SDK only provides a way to instantiate metrics and a single bare function to validate that their names are correct. Because the SDK is very plain, it has no state to know whether a set of metrics are UTF-8 or Legacy. That's why we had to create the global variable, so that the existing ecosystem of clients would not have to change every validation callsite. But there is a new API in the SDK for asking if a metric is specifically Legacy-valid, and that's the API Prometheus uses when an individual datasource is set to legacy mode. So Prometheus itself does not need the common library global variable, because it has been updated to properly support UTF-8. Other code that uses the common SDK could be similarly updated, but until that happens, if we get rid of the global than other code may break because it is not UTF-8 aware.

@bwplotka
Copy link
Member

I think there's a good argument for this. Even with the default ValidationScheme set to UTF-8, it is possible to configure a datasource for legacy validation

It is possible, but that global is always set to utf-8 so we can kill all the code that depends on model.NameValidationScheme and simplify all logic, no?

If someone specifies scraping as legacy all will work correctly, we still don't change this global.

@bwplotka
Copy link
Member

bwplotka commented Feb 24, 2025

I'm not quite sure I understand the question. "allow-utf-8" is a content negotiation header, which is part of Prometheus itself, not part of the SDK. The Common SDK only provides a way to instantiate metrics and a single bare function to validate that their names are correct. Because the SDK is very plain, it has no state to know whether a set of metrics are UTF-8 or Legacy. That's why we had to create the global variable, so that the existing ecosystem of clients would not have to change every validation callsite. But there is a new API in the SDK for asking if a metric is specifically Legacy-valid, and that's the API Prometheus uses when an individual datasource is set to legacy mode. So Prometheus itself does not need the common library global variable, because it has been updated to properly support UTF-8. Other code that uses the common SDK could be similarly updated, but until that happens, if we get rid of the global than other code may break because it is not UTF-8 aware.

I think we misunderstood each other here, around the global. I meant to remove the use of global in Prometheus only.

As per my main question I found the answer in the new code I was not aware of in expfmt: https://github.com/prometheus/common/blob/main/expfmt%2Ftext_create.go#L454

And that expfmt dynamicity with names scares me, I think we should reject if we cannot accept utf vs dynamic encoding and it's sad we maintain client_golang and this change slipped through the common update, without our explicit approval. This shouldn't be added in the first place IMO. We are adding too much complexity for too niche use cases

To sum up, let's do this encoding option asap for people to be able to unblock themselves (just let's please simplify the validation and ifs) 🙈 - I think you are right, not breaking change then, but this situation is not ideal.

What would take to have NO encoding meaning "allow-utf-8" semantics (no name escaping)? I guess we wait for 2.0 to allow this?

@ywwg
Copy link
Member Author

ywwg commented Feb 25, 2025

I think we misunderstood each other here, around the global. I meant to remove the use of global in Prometheus only.

Ah, got it. Yes I think we can remove it in prom. It's not in too many places so that shouldn't be hard.

As per my main question I found the answer in the new code I was not aware of in expfmt: prometheus/common@main/expfmt%2Ftext_create.go#L454

And that expfmt dynamicity with names scares me, I think we should reject if we cannot accept utf vs dynamic encoding and it's sad we maintain client_golang and this change slipped through the common update, without our explicit approval. This shouldn't be added in the first place IMO. We are adding too much complexity for too niche use cases

To sum up, let's do this encoding option asap for people to be able to unblock themselves (just let's please simplify the validation and ifs) 🙈 - I think you are right, not breaking change then, but this situation is not ideal.

Mostly the if statements are to cover all of the possible permutations of settings... We could start by removing switching based on model.NameValidationScheme and simply panic if it has been set to Legacy. That would make the escaping request code simpler.

What would take to have NO encoding meaning "allow-utf-8" semantics (no name escaping)? I guess we wait for 2.0 to allow this?

It depends if we care about preserving backward compatibility. Do we want to suddenly start serving UTF-8 to old versions of prometheus running on some raspberry pi that never gets updated? Suddenly scrapes would be in the new exposition format and the old code would reject the scrape, and there would be no way to configure the old prometheus to request an escaped format. My goal was to make sure old code would get the old format. (I suppose the user could set up legacy mode on the other end in this instance)

@bwplotka
Copy link
Member

It depends if we care about preserving backward compatibility. Do we want to suddenly start serving UTF-8 to old versions of prometheus running on some raspberry pi that never gets updated? Suddenly scrapes would be in the new exposition format and the old code would reject the scrape, and there would be no way to configure the old prometheus to request an escaped format.

Got it, thanks. I have bunch of counter arguments, but there's no point, the decisions were made. Hopefully with OM 2.0 we can have none of this escaping magic. Perhaps it's expected that with OM 1.0 with UTF-8 restrictions you have to poke it with extra escaping header piece to trigger non escaping. Thanks, LGTM but I would suggest simplifying the logic with hardcoded support of UTF-8 in Prometheus (assumption that global is not changing).

@ywwg
Copy link
Member Author

ywwg commented Feb 25, 2025

none of this escaping magic

long term, I do think the correct approach is to have the option on the metrics-producer side. The scraper should be able to accept anything. We can try to push things in that direction.

@ywwg ywwg changed the base branch from main to owilliams/escapeconfig-00-cleanup March 12, 2025 18:32
@ywwg
Copy link
Member Author

ywwg commented Mar 12, 2025

rebased against the cleanup PR

@ywwg ywwg marked this pull request as ready for review March 12, 2025 18:34
@ywwg ywwg force-pushed the owilliams/escapeconfig branch from d45a32f to 7274a97 Compare March 12, 2025 18:43
@ywwg
Copy link
Member Author

ywwg commented Mar 12, 2025

not sure what is wrong with the windows tests

@ywwg ywwg force-pushed the owilliams/escapeconfig-00-cleanup branch from db82701 to 94b43c5 Compare March 13, 2025 14:48
Base automatically changed from owilliams/escapeconfig-00-cleanup to main March 13, 2025 18:21
@ywwg ywwg force-pushed the owilliams/escapeconfig branch from 7274a97 to 6956644 Compare March 13, 2025 18:33
Signed-off-by: Owen Williams <owen.williams@grafana.com>
@ywwg ywwg force-pushed the owilliams/escapeconfig branch from 6956644 to bb79f20 Compare March 13, 2025 18:34
@ywwg ywwg added area/utf8 Issues pertaining to UTF-8 support component/scraping labels Mar 13, 2025
@ywwg ywwg requested a review from bwplotka March 14, 2025 13:41
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Much better! Some suggestions inlined, but tl;dr:

  • We can simplify even more and move all validation and defaulting to a single place e.g. config layer.
  • We also need docs.
  • There might be validation gap/bug.

Otherwise looks solid!

ywwg added 5 commits March 18, 2025 11:03
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Signed-off-by: Owen Williams <owen.williams@grafana.com>
@ywwg
Copy link
Member Author

ywwg commented Mar 18, 2025

as mentioned in chat, I don't love turning the schemes into required arguments in the config object. But now that the config is used to create the loop/pool, we could move the default-handling back to scrape.go and put it in newscrapepool/reload.

@ywwg ywwg requested a review from bwplotka March 19, 2025 13:05
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Much better, only two nits, LGTM!

ywwg added 4 commits March 26, 2025 15:29
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Signed-off-by: Owen Williams <owen.williams@grafana.com>
@ywwg ywwg merged commit 6566c5a into main Mar 26, 2025
44 of 45 checks passed
@ywwg ywwg deleted the owilliams/escapeconfig branch March 26, 2025 22:27
zepellin added a commit to zepellin/prometheus that referenced this pull request May 7, 2025
This is a fix for documentation update done in prometheus#16066, setting correct name for a configuration value.

Signed-off-by: Martin Danko <46035688+zepellin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/utf8 Issues pertaining to UTF-8 support component/scraping
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Create way to configure explicit escaping mode other than UTF-8 or "default"
2 participants