Skip to content

Conversation

krajorama
Copy link
Contributor

@krajorama krajorama commented Jul 4, 2025

What this PR does

Modify the ingester and block builder to accept and pass the Created timestamp field from incoming write requests to TSDB.

Note1: if samples aren't ordered by time in a remote write requests, then the result is undefined.

Note2: as with regular samples, Mimir will have non-deterministic behavior if there's a sample and a created timestamp with the same timestamp ingested on two different ingesters for whatever reason. During query and block building the resolution of such conflicting samples has always been non-deterministic, as there is no way to know at that point which sample is the correct one.

Which issue(s) this PR fixes or relates to

Fixes #11976

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

@krajorama krajorama force-pushed the krajo/created-timestamp branch 4 times, most recently from 24c0591 to b9e1a00 Compare July 4, 2025 11:33
@krajorama krajorama changed the title ingester: handle created timestamp write path: handle created timestamp Jul 4, 2025
@krajorama krajorama force-pushed the krajo/created-timestamp branch from b9e1a00 to 1efc74a Compare July 4, 2025 11:39
Copy link
Contributor

github-actions bot commented Jul 4, 2025

@krajorama krajorama force-pushed the krajo/created-timestamp branch 4 times, most recently from 36c90aa to 7774cee Compare July 5, 2025 10:39
@krajorama krajorama marked this pull request as ready for review July 5, 2025 10:39
@krajorama krajorama requested a review from a team as a code owner July 5, 2025 10:39
@krajorama krajorama marked this pull request as draft July 5, 2025 11:20
@krajorama krajorama force-pushed the krajo/created-timestamp branch 3 times, most recently from 44740eb to c7ebe12 Compare July 5, 2025 11:58
@krajorama krajorama marked this pull request as ready for review July 5, 2025 11:58
krajorama added 3 commits July 8, 2025 21:00
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama force-pushed the krajo/created-timestamp branch from c7ebe12 to eb329d8 Compare July 8, 2025 19:10
krajorama and others added 2 commits July 8, 2025 21:16
Co-authored-by: Taylor C <41653732+tacole02@users.noreply.github.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama requested a review from tacole02 July 8, 2025 19:18
@krajorama
Copy link
Contributor Author

@tacole02 Thank you for the suggested changes, I've made a small tweak on top. PTAL if you have time, thanks.

Copy link
Contributor

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM the test coverage is great so I think we're good.

I was wondering if it would be possible to have less code duplication between how the ingester does the CT appends and how the blockbuilder does it but i dont think we can do it without making this PR much larger and even then I don't know if it would be worth it.

Great work Krajo

Comment on lines 148 to 150
if err != nil && !errors.Is(err, storage.ErrOutOfOrderCT) && !errors.Is(err, storage.ErrOutOfOrderSample) {
level.Warn(b.logger).Log("msg", "failed to store zero float sample for created timestamp", "tenant", userID, "err", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that the code in ingester, handles this error as "soft-error", ie. errProcessor.ProcessErr(err, ...). I believe the result of that is that the sample will be counted towards discarded samples metric.

This is a nit, but do we want to count a discarded created sample in the block-builder as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I've noticed that the logic about skipping the sample when its created timestamp fails is different between the ingester and the block-builder (ref the comment below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit, but do we want to count a discarded created sample in the block-builder as well?

Just add discardedSamples++ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just discardedSamples will do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added test as well

stats.succeededSamplesCount++
} else if !errors.Is(err, storage.ErrOutOfOrderCT) && !errors.Is(err, storage.ErrOutOfOrderSample) {
errProcessor.ProcessErr(err, ts.CreatedTimestamp, ts.Labels)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double-check, the continue here will cause the sample s to be skipped, if there was an error in the append of its created sample. I think, the piece of code in the block-builder doesn't skip the sample. Is this on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I think that's a fair point, shouldn't skip

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've copy pasted the error handling, I'll try to add a unit test to cover this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the code and added tests, but the native histogram one fails as there's a bug in upstream Prometheus. So I need to fix the bug there, and update mimir-prometheus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the bug is not new prometheus/prometheus#16332

@narqo
Copy link
Contributor

narqo commented Jul 9, 2025

I was wondering if it would be possible to have less code duplication between how the ingester does the CT appends and how the blockbuilder does it but i dont think we can do it without making this PR much larger and even then I don't know if it would be worth it.

I did a couple of attempts trying to unify these pieces of the ingester and the block-builder, but that, indeed, resulted in a fairly large refactoring of ingester's internals, which was a bit scary. So every time, I gave up. I still want to do that. Although, tbf, an extra doubt I have currently, comes from the parquet exploration, that folks do in parallel. If parquet is the way, I'm not sure the block-builder will keep building tsdb blocks at all.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Copy link
Contributor

@narqo narqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides that we need the fix in the upstream prometheus (ref #11977 (comment)), the changes here look good to me 🔥

Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks, @krajorama !

@zenador
Copy link
Contributor

zenador commented Jul 10, 2025

🤖 Automated comment

The CHANGELOG has just been cut to prepare for the next release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG.md document. Thanks!

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama
Copy link
Contributor Author

krajorama commented Jul 10, 2025

Looks great! Thanks, @krajorama !

Thanks @tacole02

Unfortunately I have to drop the changes to the existing CHANGELOG entry as the release has been cut in the meantime :( So I'm moving the text to the unreleased section as enhancement.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
#11953

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@krajorama krajorama merged commit 6178f37 into main Jul 10, 2025
31 checks passed
@krajorama krajorama deleted the krajo/created-timestamp branch July 10, 2025 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Idea: handle created timestamp field in Remote-Write (PRW) 2.0
5 participants