otlp: Create experimental no-translation mode #16014

ywwg · 2025-02-11T21:10:18Z

Add an option to not add suffixes. This creates the possibility of collisions when metrics have the same name but different type or unit, but this enables better testing of the OTLP flow

config/config.go

Add an option to not add suffixes. This creates the possibility of collisions when metrics have the same name but different type or unit, but this enables better testing of the OTLP flow Signed-off-by: Owen Williams <owen.williams@grafana.com>

jmichalek132

LGMT, ❤️ Finally this will allows us to just pass metrics from OTEL SDKs without any changes to them, not having to explain the translation logic to developers, because they will be able to find their metrics based on their name.

ArthurSens

LGTM, just have one minor comment

ArthurSens · 2025-02-13T16:55:29Z

storage/remote/write_test.go

@@ -391,21 +428,51 @@ func TestOTLPWriteHandler(t *testing.T) {
 	req.Header.Set("Content-Type", "application/x-protobuf")

 	appendable := &mockAppendable{}
+	conf := config.DefaultOTLPConfig
+	conf.TranslationStrategy = config.NoTranslation


Should this be inside handleOtlp? If we're using handleOtlp for several functions, I feel like the configuration should be passed as a parameter.

bwplotka

Thanks!

Personally I'm supportive, IMO we should do this long term. I think it makes sense to do it for testing purposes internally in the code too.

However, this "testing" is actually leaking to the user space e.g. any user will be able to send metrics without suffixes "legally". Looking on some comments I know non-zero amount of users who will recommend that, next day this is merged (: Should we reject this on our APIs first?

I'm happy to consider enabling this short term, but I know there is a valid perspective that "legally" allowing metrics without suffixes will cause real damage. This is because we don't yet have proper metadata handling on both ingestion (causing collision) and consumption (poor PromQL UX). We are working on the first problem, the latter (as you remember from OM discussions) we have mixed opinions on long term too.

So:

We are technically not ready for this, are we ok to enable this before we are ready (even opt-in?).
Do we want to stop enforcing this long term? I started a quick doc to discuss that on the next dev-summit.

Again, I am happy to consider 1 and 2 as "yes", as I see benefits of feeling the pain, which might motivate community to quicker fixes here, but I would love to hear the opposite opinions before we merge e.g. from @juliusv @RichiH or DevSummit WDYT?

bwplotka · 2025-02-13T18:41:32Z

Perhaps compile-time flag might be safer for testing (as discussed internally with @owen-d e.g. build tag)

ywwg · 2025-02-13T18:43:18Z

wrong owen :) (I'm @ywwg for extremely historical reasons)

ywwg · 2025-02-14T19:32:33Z

compile-time flag

Do you think that approach has a good chance of being accepted? Every other experimental feature uses the feature flag infrastructure so this would be a totally new infrastructure for enabling features and I would hate to do the work and then have it be rejected.

jmichalek132 · 2025-02-15T11:41:56Z

compile-time flag

Do you think that approach has a good chance of being accepted? Every other experimental feature uses the feature flag infrastructure so this would be a totally new infrastructure for enabling features and I would hate to do the work and then have it be rejected.

If I am not wrong this approach has been used before for stringlabels https://prometheus.io/blog/2023/03/21/stringlabel/.

bwplotka · 2025-02-17T09:59:58Z

Yes, we have precedence with stringlabels and deduplabels and a positive feedback on Slack discussion on this approach here (:

bwplotka

3w ago I wanted to make sure everyone is onboard on adding this opt-in no translation mode. Last week, we discussed this in depth, and after the last consensus I think this gets unblocked 🎉

bboreham · 2025-03-03T09:36:00Z

Opting in to this would break existing dashboards and rules, which is a problem for folks who generally want no-translation but are already using the existing translation.

I can see options:

Allow to configure for certain metrics (e.g. a matcher), so they can cut over gradually.
Generate both translated and not-translated versions (i.e. doubling cardinality).
Fix it query-side (which has been discussed before and generally found to be too hard).

bwplotka · 2025-03-03T09:44:37Z

For renaming, we will be speaking about some way in https://kccnceu2025.sched.com/event/1txHv/how-to-rename-metrics-without-impacting-somebodys-observability-bartlomiej-plotka-google-arianna-vespri-independent?iframe=no with @vesari ;p

ywwg · 2025-03-03T17:11:54Z

Here is the issue where we discussed doing the migration on the query side and decided it was too complex and "magical" -- #13765

I like the idea of gradual cutover, though we would need to think about how to manage the performance

ArthurSens · 2025-03-25T20:34:19Z

I can see options:

Allow to configure for certain metrics (e.g. a matcher), so they can cut over gradually.

Can we introduce that without making breaking changes to the current configuration file? We currently have the following structure

otlp:
  translation_strategy: NoTranslation | UnderscoreEscapingWithSuffixes | NoUTF8EscapingWithSuffixes

How could we introduce matchers to this?

Generate both translated and not-translated versions (i.e. doubling cardinality).

That sounds an even bigger problem because doubling cardinality could lead to Prometheus crashing. Your dashboard will also break if Prometheus isn't running 😅

Fix it query-side (which has been discussed before and generally found to be too hard).

Yep...

I agree that the switch from translation to noTranslation won't be 100% smooth, but isn't updating dashboards the easiest task here?

ywwg · 2025-03-26T17:25:15Z

@bboreham Otel's Prometheus working group had a meeting and we came up with the following proposed positions:

Metric name migration is a general prometheus problem that can be triggered by any number of infrastructure changes, of which changing this flag is just one. We believe accommodation of metric name migration is out of scope for this particular feature, and in general we'll add language and warnings to the effect of, changing this flag on an existing system will change metric names and cause breakage
Default dashboards and alerts will indeed need to be updated to support these new names, but we also believe that this feature should not be blocked on the non-existence of those dashboards. That seems to us to be a bit putting the cart before the horse. We do agree that companies would not want to opt-in new customers on no-translation mode without such updates.

So, tl;dr, we'd like to move ahead on implementing the feature and then fork off two new issues: consideration of metric name migration, and updating stock dashboards and alerts for the new names.

ywwg · 2025-04-02T18:53:21Z

closing in favor of @ArthurSens 's work

ArthurSens reviewed Feb 12, 2025

View reviewed changes

config/config.go Outdated Show resolved Hide resolved

config/config.go Outdated Show resolved Hide resolved

ywwg marked this pull request as ready for review February 12, 2025 16:01

ywwg requested review from cstyan, bwplotka and tomwilkie as code owners February 12, 2025 16:01

ywwg marked this pull request as draft February 12, 2025 16:17

ywwg force-pushed the owilliams/no-suffixes branch from 5e086d0 to 1fa3498 Compare February 12, 2025 19:37

ywwg marked this pull request as ready for review February 12, 2025 19:37

jmichalek132 approved these changes Feb 12, 2025

View reviewed changes

dashpole approved these changes Feb 13, 2025

View reviewed changes

ArthurSens approved these changes Feb 13, 2025

View reviewed changes

bwplotka requested changes Feb 13, 2025

View reviewed changes

ywwg marked this pull request as draft February 14, 2025 15:09

bwplotka approved these changes Mar 3, 2025

View reviewed changes

ywwg closed this Apr 2, 2025

ArthurSens mentioned this pull request Apr 16, 2025

feat: Support 'NoTranslation' mode in OTLP endpoint #16441

Merged

otlp: Create experimental no-translation mode #16014

otlp: Create experimental no-translation mode #16014

Uh oh!

Conversation

ywwg commented Feb 11, 2025

Uh oh!

Uh oh!

Uh oh!

jmichalek132 left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurSens left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurSens Feb 13, 2025

Choose a reason for hiding this comment

Uh oh!

bwplotka left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bwplotka commented Feb 13, 2025

Uh oh!

ywwg commented Feb 13, 2025

Uh oh!

ywwg commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmichalek132 commented Feb 15, 2025

Uh oh!

bwplotka commented Feb 17, 2025

Uh oh!

bwplotka left a comment

Choose a reason for hiding this comment

Uh oh!

bboreham commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwplotka commented Mar 3, 2025

Uh oh!

ywwg commented Mar 3, 2025

Uh oh!

ArthurSens commented Mar 25, 2025

Uh oh!

ywwg commented Mar 26, 2025

Uh oh!

ywwg commented Apr 2, 2025

Uh oh!

Uh oh!

bwplotka left a comment •

edited

Loading

ywwg commented Feb 14, 2025 •

edited

Loading

bboreham commented Mar 3, 2025 •

edited

Loading