Skip to content

Conversation

jesusvazquez
Copy link
Contributor

@jesusvazquez jesusvazquez commented Sep 5, 2024

Relates to https://opentelemetry.io/docs/specs/otel/metrics/data-model/#resets-and-gaps and open-telemetry/opentelemetry-specification#4184

While testing our previously introduced feature to introduce 0's at ts-1 when an OTLP sample had StartTimeUnixNano == sample.timestamp we noticed that it was not working for some usages of the OTel SDK and connectors like the spanmetrics connector.

Later we found out the reason is that it's unclear to what value StartTImeUnixNano should be set to when there is a reset in a cumulative stream. The links at the beginning give you more context on what the spec says and then a reasonable ask by Arthur asking to have a StartTimeUnixNano initialized per series and to the first observation.

Unfortunately it will take time to get there.

So what this PR does is to slightly modify our existing workaround of adding those zeros. Instead of writing the zero if the sample had StartTimeUnixNano == sample.timestamp, now it will only insert the zero if the sample's StartTimeUnixNano is within 2 minutes of sample.Timestamp

The downside of this approach is that we can't know the scrape interval of samples so in the case of more than 1 sample sent per minute, this workaround could lead to duplicate zero's for the same timestamp until that minute has passed. This incurs an increase of duplicate data in memory and an increase of DPM per series.

  • This duplicate data in memory should be removed during compaction.

@jesusvazquez jesusvazquez force-pushed the jvp/otlp-insert-created-timestamp-within-interval branch 3 times, most recently from 20352ee to c8bc00e Compare September 5, 2024 10:43
@jesusvazquez jesusvazquez marked this pull request as ready for review September 5, 2024 10:45
@jesusvazquez jesusvazquez requested a review from aknuds1 September 5, 2024 10:45
@jesusvazquez jesusvazquez self-assigned this Sep 5, 2024
@jesusvazquez jesusvazquez force-pushed the jvp/otlp-insert-created-timestamp-within-interval branch 3 times, most recently from bbe3b38 to 95c22e2 Compare September 5, 2024 14:03
After a lot of investigation we understood a bit better OTel's spec
handling of StartTimeUnixNano and TimeUnixNano for cumulative streams.

This change inserts an extra zero to any incoming sample that has
StartTimeUnixNano set within 2 minutes of TimeUnixNano.

Also unit tests are included.

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
@jesusvazquez jesusvazquez force-pushed the jvp/otlp-insert-created-timestamp-within-interval branch 4 times, most recently from f8d7e96 to 9789b21 Compare September 5, 2024 14:11
Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable.

Comment on lines 180 to 181
// TODO(jesus.vazquez) Temporary debug log to understand the behavior of start ts and ts. We're only tracking counters
for _, v := range c.startTsInfoPerSeries {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why store the info? Why not log it out straight away when trackStartTimestampForSeries is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, dropped this map and just logged right away.

everyN everyNTimes
unique map[uint64]*prompb.TimeSeries
conflicts map[uint64][]*prompb.TimeSeries
startTsInfoPerSeries map[uint64]StartTsAndTs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if this is for debugging only, I would comment that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this map.

@jesusvazquez jesusvazquez force-pushed the jvp/otlp-insert-created-timestamp-within-interval branch 2 times, most recently from d39e1b5 to 3741aae Compare September 5, 2024 14:36
Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good, except I think maybe a startTimestampNs variable should be removed for consistency (see comment).

As mentioned before, I am a bit concerned though about potentially adding zero samples for many samples within the interval (validIntervalForStartTimestamps). I.e., potentially many consecutive samples could have a start time delta <= validIntervalForStartTimestamps.

@jesusvazquez jesusvazquez force-pushed the jvp/otlp-insert-created-timestamp-within-interval branch from 3741aae to b6ff474 Compare September 5, 2024 14:49
This commit passes logger down to FromMetrics and extends the converter
to printout some information about the incoming series timestamps and
start timestamps.

Only for sums.

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

format series

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

dont store, log right away

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
@jesusvazquez jesusvazquez force-pushed the jvp/otlp-insert-created-timestamp-within-interval branch from b6ff474 to 0c21f74 Compare September 5, 2024 14:50
@jesusvazquez
Copy link
Contributor Author

Note: I'm not updating the changelog because the existing entry is enough.

  • [ENHANCEMENT] OTLP receiver: If the feature flag --created-timestamp-zero-ingestion is true, convert OTel start timestamps to Prometheus zero samples. #14759

@jesusvazquez jesusvazquez merged commit 8c23578 into main Sep 5, 2024
9 checks passed
@jesusvazquez jesusvazquez deleted the jvp/otlp-insert-created-timestamp-within-interval branch September 5, 2024 15:24
krajorama added a commit to grafana/mimir that referenced this pull request Sep 6, 2024
Followup to
grafana/mimir-prometheus#689

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
krajorama added a commit to grafana/mimir that referenced this pull request Sep 6, 2024
* Vendor update mimir-prometheus at 53cf3fb8e7e3

* Add Close method to activity trackers
Followup to prometheus/prometheus#14064

* Pass logger to otlp FromMetrics
Followup to
grafana/mimir-prometheus#689

* Pin google GRPC to 1.65 for the time being
Possible performance regression:
Ref: grafana/dskit#581

Also has breaking API changes for the experimental buffers.

* Remove workaround in test
Follow up to:
prometheus/prometheus#14772

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants