Skip to content

Conversation

ywwg
Copy link
Member

@ywwg ywwg commented Feb 11, 2025

Add an option to not add suffixes. This creates the possibility of collisions when metrics have the same name but different type or unit, but this enables better testing of the OTLP flow

@ywwg ywwg marked this pull request as ready for review February 12, 2025 16:01
@ywwg ywwg marked this pull request as draft February 12, 2025 16:17
Add an option to not add suffixes. This creates the possibility of collisions when metrics have the same name but different type or unit, but this enables better testing of the OTLP flow

Signed-off-by: Owen Williams <owen.williams@grafana.com>
@ywwg ywwg force-pushed the owilliams/no-suffixes branch from 5e086d0 to 1fa3498 Compare February 12, 2025 19:37
@ywwg ywwg marked this pull request as ready for review February 12, 2025 19:37
Copy link
Contributor

@jmichalek132 jmichalek132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT, ❤️ Finally this will allows us to just pass metrics from OTEL SDKs without any changes to them, not having to explain the translation logic to developers, because they will be able to find their metrics based on their name.

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just have one minor comment

@@ -391,21 +428,51 @@ func TestOTLPWriteHandler(t *testing.T) {
req.Header.Set("Content-Type", "application/x-protobuf")

appendable := &mockAppendable{}
conf := config.DefaultOTLPConfig
conf.TranslationStrategy = config.NoTranslation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be inside handleOtlp? If we're using handleOtlp for several functions, I feel like the configuration should be passed as a parameter.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Personally I'm supportive, IMO we should do this long term. I think it makes sense to do it for testing purposes internally in the code too.

However, this "testing" is actually leaking to the user space e.g. any user will be able to send metrics without suffixes "legally". Looking on some comments I know non-zero amount of users who will recommend that, next day this is merged (: Should we reject this on our APIs first?

I'm happy to consider enabling this short term, but I know there is a valid perspective that "legally" allowing metrics without suffixes will cause real damage. This is because we don't yet have proper metadata handling on both ingestion (causing collision) and consumption (poor PromQL UX). We are working on the first problem, the latter (as you remember from OM discussions) we have mixed opinions on long term too.

So:

  1. We are technically not ready for this, are we ok to enable this before we are ready (even opt-in?).
  2. Do we want to stop enforcing this long term? I started a quick doc to discuss that on the next dev-summit.

Again, I am happy to consider 1 and 2 as "yes", as I see benefits of feeling the pain, which might motivate community to quicker fixes here, but I would love to hear the opposite opinions before we merge e.g. from @juliusv @RichiH or DevSummit WDYT?

@bwplotka
Copy link
Member

Perhaps compile-time flag might be safer for testing (as discussed internally with @owen-d e.g. build tag)

@ywwg
Copy link
Member Author

ywwg commented Feb 13, 2025

wrong owen :) (I'm @ywwg for extremely historical reasons)

@ywwg ywwg marked this pull request as draft February 14, 2025 15:09
@ywwg
Copy link
Member Author

ywwg commented Feb 14, 2025

compile-time flag

Do you think that approach has a good chance of being accepted? Every other experimental feature uses the feature flag infrastructure so this would be a totally new infrastructure for enabling features and I would hate to do the work and then have it be rejected.

@jmichalek132
Copy link
Contributor

compile-time flag

Do you think that approach has a good chance of being accepted? Every other experimental feature uses the feature flag infrastructure so this would be a totally new infrastructure for enabling features and I would hate to do the work and then have it be rejected.

If I am not wrong this approach has been used before for stringlabels https://prometheus.io/blog/2023/03/21/stringlabel/.

@bwplotka
Copy link
Member

Yes, we have precedence with stringlabels and deduplabels and a positive feedback on Slack discussion on this approach here (:

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3w ago I wanted to make sure everyone is onboard on adding this opt-in no translation mode. Last week, we discussed this in depth, and after the last consensus I think this gets unblocked 🎉

@bboreham
Copy link
Member

bboreham commented Mar 3, 2025

Opting in to this would break existing dashboards and rules, which is a problem for folks who generally want no-translation but are already using the existing translation.

I can see options:

  • Allow to configure for certain metrics (e.g. a matcher), so they can cut over gradually.
  • Generate both translated and not-translated versions (i.e. doubling cardinality).
  • Fix it query-side (which has been discussed before and generally found to be too hard).

@bwplotka
Copy link
Member

bwplotka commented Mar 3, 2025

@ywwg
Copy link
Member Author

ywwg commented Mar 3, 2025

Here is the issue where we discussed doing the migration on the query side and decided it was too complex and "magical" -- #13765

I like the idea of gradual cutover, though we would need to think about how to manage the performance

@ArthurSens
Copy link
Member

I can see options:

  • Allow to configure for certain metrics (e.g. a matcher), so they can cut over gradually.

Can we introduce that without making breaking changes to the current configuration file? We currently have the following structure

otlp:
  translation_strategy: NoTranslation | UnderscoreEscapingWithSuffixes | NoUTF8EscapingWithSuffixes

How could we introduce matchers to this?

  • Generate both translated and not-translated versions (i.e. doubling cardinality).

That sounds an even bigger problem because doubling cardinality could lead to Prometheus crashing. Your dashboard will also break if Prometheus isn't running 😅

  • Fix it query-side (which has been discussed before and generally found to be too hard).

Yep...


I agree that the switch from translation to noTranslation won't be 100% smooth, but isn't updating dashboards the easiest task here?

@ywwg
Copy link
Member Author

ywwg commented Mar 26, 2025

@bboreham Otel's Prometheus working group had a meeting and we came up with the following proposed positions:

  • Metric name migration is a general prometheus problem that can be triggered by any number of infrastructure changes, of which changing this flag is just one. We believe accommodation of metric name migration is out of scope for this particular feature, and in general we'll add language and warnings to the effect of, changing this flag on an existing system will change metric names and cause breakage
  • Default dashboards and alerts will indeed need to be updated to support these new names, but we also believe that this feature should not be blocked on the non-existence of those dashboards. That seems to us to be a bit putting the cart before the horse. We do agree that companies would not want to opt-in new customers on no-translation mode without such updates.

So, tl;dr, we'd like to move ahead on implementing the feature and then fork off two new issues: consideration of metric name migration, and updating stock dashboards and alerts for the new names.

@ywwg
Copy link
Member Author

ywwg commented Apr 2, 2025

closing in favor of @ArthurSens 's work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants