Skip to content

Conversation

fionaliao
Copy link
Contributor

@fionaliao fionaliao commented Mar 31, 2025

Part of #12763

This PR adds primitive support for otlp delta metrics. This allows otlp metrics with delta temporality to be ingested and stored as-is, with metric type unknown. To get "increase" or "rate", sum_over_time(metric[<interval>]) (/ <interval>) can be used.

This is the first step towards implementing prometheus/proposals#48. That proposal has additional suggestions around type-aware functions and making the rate() and increase() functions work for deltas too. However, there are some questions around the best way to do querying over deltas, so having this simple implementation without changing any PromQL functions allow us to get some form of delta ingestion out there gather some feedback to decide the best way to go further.

The feature can be enabled with --enable-feature=otlp-native-delta-ingestion.

Testing

Unit tests have been written for delta sums/histograms/exp histograms.

Manual end to end testing has been done - can run with https://github.com/fionaliao/prom-otel-delta-support-testing to get some demo delta and cumualtive counters and histograms, example screenshots below.

image image image image image

@fionaliao
Copy link
Contributor Author

@beorn7 @ArthurSens - this is a PR for simple ingestion of raw delta samples as discussed in the dev summit today (in draft - no proper testing yet!)

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start!! Let's finish tests and also add a bit of documentation on the feature flag saying that this is a very early stage implementation of your proposal. People can enable and start experimenting using deltas with existing PromQL functions, but we will collect feedback and likely build more things that improve the experience around deltas

@fionaliao fionaliao force-pushed the fionaliao/basic-delta-support branch from 71d4da7 to 0b0bee8 Compare April 3, 2025 19:30
@fionaliao
Copy link
Contributor Author

fionaliao commented Apr 3, 2025

To consider: should the flag be called otlp-native-delta-support or native-delta-support? Currently it only modifies the behaviour of the otlp endpoint, but if we start checking for the delta_counter type in functions behind the feature flag, there's nothing stopping someone from remote writing "delta" metrics, so the "otlp" part wouldn't be accurate anymore.

My preference would be to call it otlp-native-delta-support for now, and then rename to just native-delta-support if we start making type-aware functions. I'm not sure if it's okay for us to change the name of a feature flag once it's available, however.

Edit: actually I think I want two flags - one for ingesting deltas via the otlp endpoint, and one for temporality-aware functions See: #16360 (comment). I've also changed the flag name to otlp-native-delta-ingestion.

@fionaliao fionaliao changed the title WIP: very primitive native otel delta ingestion Add primitive support for ingesting OTLP delta metrics as-is Apr 4, 2025
@fionaliao fionaliao marked this pull request as ready for review April 4, 2025 17:10
@fionaliao
Copy link
Contributor Author

This is ready for review - I've added tests and did some manual e2e testing, and added a bit of documentation about the feature.

@beorn7
Copy link
Member

beorn7 commented Apr 10, 2025

Currently swamped with other reviews (that have been waiting for far too long and where I am pretty much the only available reviewer). I'm happy to look at this in detail ASAP (but that might no be very soon). I know the @ArthurSens is on vacation right now. Let's see who gets to this first. Even better if we could find another reviewer for this to speed things up.

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, thank you for continuing the work!

Almost everything looks good, but histograms are looking a bit weird to me... could you clarify?

@fionaliao fionaliao force-pushed the fionaliao/basic-delta-support branch from 912188b to 68aa01b Compare April 17, 2025 17:04
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
fionaliao and others added 12 commits April 17, 2025 19:08
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
@fionaliao fionaliao force-pushed the fionaliao/basic-delta-support branch from 18e34fb to a48bee8 Compare April 17, 2025 18:08
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
@fionaliao
Copy link
Contributor Author

fionaliao commented Apr 22, 2025

One interesting thing I just noticed - when attempting to use rate() on a delta native histogram, a warning annotationPromQL warning: this native histogram metric is not a counter: "test_histogram" (1:6) is added:

image

This is because the counter reset hint is GaugeType to avoid unnecessary chunk cutting, which is checked for in the rate() function here:

if isCounter && (prev.CounterResetHint == histogram.GaugeType || last.CounterResetHint == histogram.GaugeType) {
annos.Add(annotations.NewNativeHistogramNotCounterWarning(metricName, pos))
}

There's not a similar error for the delta counters (where the type is set to unknown for now) since there's no real type checks for counters at the moment.

I think the current PR implementation is fine as it is, this comment is just to note this behaviour.

@fionaliao
Copy link
Contributor Author

@ArthurSens This is ready for re-review

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ArthurSens
Copy link
Member

Let's give @beorn7 a week to review the PR, if he doesn't fine a chance to do it then let's merge and continue the proposal :)

@beorn7
Copy link
Member

beorn7 commented Apr 22, 2025

Don't wait for me. I trust you to do the right thing. And if it's not the right thing after all, it isn't the end of the world because it is behind a feature flag.

@ArthurSens
Copy link
Member

Alright, thank you! Let's merge it then :)

@ArthurSens ArthurSens merged commit 7ec63b1 into main Apr 23, 2025
45 checks passed
@ArthurSens ArthurSens deleted the fionaliao/basic-delta-support branch April 23, 2025 12:58
fionaliao added a commit to grafana/mimir that referenced this pull request Jul 7, 2025
Introduces the `-distributor.otel-native-delta-ingestion` flag
(and corresponding per-tenant setting), which enables primitive OTEL
delta metrics ingestion via the OTLP endpoint. This feature was
implemented in Prometheus in
prometheus/prometheus#16360. This PR allows
Mimir users to enable this feature too.

As per the Prometheus PR:

> This allows otlp metrics with delta temporality to be ingested and
stored as-is, with metric type unknown. To get "increase" or "rate",
`sum_over_time(metric[<interval>])` (`/ <interval>`) can be used.

> This is the first step towards implementing
prometheus/proposals#48. That proposal has
additional suggestions around type-aware functions and making the rate()
and increase() functions work for deltas too. However, there are some
questions around the best way to do querying over deltas, so having this
simple implementation without changing any PromQL functions allow us to
get some form of delta ingestion out there gather some feedback to
decide the best way to go further.

---------

Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants