-
Notifications
You must be signed in to change notification settings - Fork 9.8k
otlp: Create experimental no-translation mode #16014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add an option to not add suffixes. This creates the possibility of collisions when metrics have the same name but different type or unit, but this enables better testing of the OTLP flow Signed-off-by: Owen Williams <owen.williams@grafana.com>
5e086d0
to
1fa3498
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT, ❤️ Finally this will allows us to just pass metrics from OTEL SDKs without any changes to them, not having to explain the translation logic to developers, because they will be able to find their metrics based on their name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just have one minor comment
@@ -391,21 +428,51 @@ func TestOTLPWriteHandler(t *testing.T) { | |||
req.Header.Set("Content-Type", "application/x-protobuf") | |||
|
|||
appendable := &mockAppendable{} | |||
conf := config.DefaultOTLPConfig | |||
conf.TranslationStrategy = config.NoTranslation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be inside handleOtlp? If we're using handleOtlp for several functions, I feel like the configuration should be passed as a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Personally I'm supportive, IMO we should do this long term. I think it makes sense to do it for testing purposes internally in the code too.
However, this "testing" is actually leaking to the user space e.g. any user will be able to send metrics without suffixes "legally". Looking on some comments I know non-zero amount of users who will recommend that, next day this is merged (: Should we reject this on our APIs first?
I'm happy to consider enabling this short term, but I know there is a valid perspective that "legally" allowing metrics without suffixes will cause real damage. This is because we don't yet have proper metadata handling on both ingestion (causing collision) and consumption (poor PromQL UX). We are working on the first problem, the latter (as you remember from OM discussions) we have mixed opinions on long term too.
So:
- We are technically not ready for this, are we ok to enable this before we are ready (even opt-in?).
- Do we want to stop enforcing this long term? I started a quick doc to discuss that on the next dev-summit.
Again, I am happy to consider 1 and 2 as "yes", as I see benefits of feeling the pain, which might motivate community to quicker fixes here, but I would love to hear the opposite opinions before we merge e.g. from @juliusv @RichiH or DevSummit WDYT?
Perhaps compile-time flag might be safer for testing (as discussed internally with @owen-d e.g. build tag) |
wrong owen :) (I'm @ywwg for extremely historical reasons) |
Do you think that approach has a good chance of being accepted? Every other experimental feature uses the feature flag infrastructure so this would be a totally new infrastructure for enabling features and I would hate to do the work and then have it be rejected. |
If I am not wrong this approach has been used before for stringlabels https://prometheus.io/blog/2023/03/21/stringlabel/. |
Yes, we have precedence with stringlabels and deduplabels and a positive feedback on Slack discussion on this approach here (: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3w ago I wanted to make sure everyone is onboard on adding this opt-in no translation mode. Last week, we discussed this in depth, and after the last consensus I think this gets unblocked 🎉
Opting in to this would break existing dashboards and rules, which is a problem for folks who generally want no-translation but are already using the existing translation. I can see options:
|
For renaming, we will be speaking about some way in https://kccnceu2025.sched.com/event/1txHv/how-to-rename-metrics-without-impacting-somebodys-observability-bartlomiej-plotka-google-arianna-vespri-independent?iframe=no with @vesari ;p |
Here is the issue where we discussed doing the migration on the query side and decided it was too complex and "magical" -- #13765 I like the idea of gradual cutover, though we would need to think about how to manage the performance |
Can we introduce that without making breaking changes to the current configuration file? We currently have the following structure otlp:
translation_strategy: NoTranslation | UnderscoreEscapingWithSuffixes | NoUTF8EscapingWithSuffixes How could we introduce matchers to this?
That sounds an even bigger problem because doubling cardinality could lead to Prometheus crashing. Your dashboard will also break if Prometheus isn't running 😅
Yep... I agree that the switch from translation to noTranslation won't be 100% smooth, but isn't updating dashboards the easiest task here? |
@bboreham Otel's Prometheus working group had a meeting and we came up with the following proposed positions:
So, tl;dr, we'd like to move ahead on implementing the feature and then fork off two new issues: consideration of metric name migration, and updating stock dashboards and alerts for the new names. |
closing in favor of @ArthurSens 's work |
Add an option to not add suffixes. This creates the possibility of collisions when metrics have the same name but different type or unit, but this enables better testing of the OTLP flow