MQE: don't reuse `FloatHistogram` instances for successive points in instant vector selectors #9588

charleskorn · 2024-10-11T02:15:34Z

What this PR does

This PR is an iteration on #8677. It changes the behaviour of instant vector selectors to not return the same FloatHistogram instance for successive points if they all select the same histogram.

This preserves the main optimisation of #8677 (avoiding creating copies of histograms in sum) without the complexity and associated bugs of needing to ensure that shared FloatHistogram instances are modified safely.

For example, I just found yet another issue related to this that was introduced in #9507, where shared FloatHistogram instances could be left in the unused portion of the histogram slice here and then corrupted if the HPoint slice was later used for a range vector selector.

I believe similar corruption could occur if a HPoint slice was used for an instant vector selector that returned the same FloatHistogram for successive points, and then that slice was reused for a range vector selector: the range vector selector would reuse the FloatHistogram instances when filling the buffer here and so all points would have the same value as the last histogram, rather than their expected values.

The performance impact of this change is minimal in our benchmarks: less than 1% degradation or less than 5% reduction in latency in all cases. In some cases things get slightly slower due to the extra copies created, and in some cases things get slightly faster due to the simpler logic. I think the simpler logic and far reduced likelihood of bugs make this change worthwhile.

In the real world, where lookback is more common, we'll only need to make copies (and therefore pay the price of those copies) if the same point is used for successive output samples, which would usually only happen if the query step is smaller than the scrape interval. Given most users run with 15s or 60s scrape intervals, and most queries tend to be either instant queries, I don't think this is likely to become a huge issue.

Which issue(s) this PR fixes or relates to

(none)

Checklist

Tests updated.
[n/a] Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
[n/a] about-versioning.md updated with experimental features.

…amples in a series

…tHistograms that have been used multiple times

jhesketh

I agree with the rationale of this. I do wonder though if we can measure the real-world impact. Should we have a metric on number of copied samples? We can see if that rises too high if we should reconsider reimplementing such an optimisation down the road (although it may be a bigger risk to do later)

charleskorn · 2024-10-14T00:45:56Z

Should we have a metric on number of copied samples?

We should be able to see this on profiles, including the continuous profiling we run on Mimir instances in Grafana Cloud.

jhesketh

Makes sense to me. Are we adequately testing the series merging to catch the latest clear?

charleskorn · 2024-10-14T02:11:35Z

Makes sense to me. Are we adequately testing the series merging to catch the latest clear?

I've added a test to ensure clear is called when we expect, and the existing tests should be sufficient to catch any other regressions.

* Fix another possible source of `FloatHistogram` corruption * Add test

charleskorn added 4 commits October 11, 2024 11:56

Don't allow returning the same FloatHistogram instance for multiple s…

2f05a8f

…amples in a series

Simplify logic where we can, now that we don't need to check for Floa…

a7a4e85

…tHistograms that have been used multiple times

Add reminder to comment

7942ee0

Add changelog entry

dee66b9

charleskorn marked this pull request as ready for review October 11, 2024 02:28

charleskorn requested a review from a team as a code owner October 11, 2024 02:28

jhesketh approved these changes Oct 14, 2024

View reviewed changes

Fix another possible source of FloatHistogram corruption

9da817e

charleskorn requested a review from jhesketh October 14, 2024 01:07

jhesketh approved these changes Oct 14, 2024

View reviewed changes

Add test

255b27f

charleskorn merged commit dd18bcb into main Oct 14, 2024
29 checks passed

charleskorn deleted the charleskorn/mqe-nh-simplification branch October 14, 2024 02:20

This was referenced Oct 14, 2024

Backport part of #9588 to r312 #9603

Merged

Backport part of #9588 to r311 #9604

Merged

Backport part of #9588 to r310 #9605

Merged

Backport part of #9588 to r309 #9606

Merged

Backport part of #9588 to r308 #9607

Merged

charleskorn added a commit that referenced this pull request Oct 14, 2024

Backport part of #9588 to r311 (#9604)

914f3e6

* Fix another possible source of `FloatHistogram` corruption * Add test

charleskorn added a commit that referenced this pull request Oct 14, 2024

Backport part of #9588 to r309 (#9606)

97cfa93

* Fix another possible source of `FloatHistogram` corruption * Add test

charleskorn added a commit that referenced this pull request Oct 14, 2024

Backport part of #9588 to r308 (#9607)

563e6f8

* Fix another possible source of `FloatHistogram` corruption * Add test

charleskorn added a commit that referenced this pull request Oct 14, 2024

Backport part of #9588 to r312 (#9603)

44653ea

* Fix another possible source of `FloatHistogram` corruption * Add test

charleskorn added a commit that referenced this pull request Oct 14, 2024

Backport part of #9588 to r310 (#9605)

a492668

* Fix another possible source of `FloatHistogram` corruption * Add test

jhesketh mentioned this pull request Dec 2, 2024

Mimir Query Engine #10067

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MQE: don't reuse `FloatHistogram` instances for successive points in instant vector selectors #9588

MQE: don't reuse `FloatHistogram` instances for successive points in instant vector selectors #9588

Uh oh!

charleskorn commented Oct 11, 2024 •

edited

Loading

Uh oh!

jhesketh left a comment

Uh oh!

charleskorn commented Oct 14, 2024

Uh oh!

jhesketh left a comment

Uh oh!

charleskorn commented Oct 14, 2024

Uh oh!

Uh oh!

Uh oh!

MQE: don't reuse FloatHistogram instances for successive points in instant vector selectors #9588

MQE: don't reuse FloatHistogram instances for successive points in instant vector selectors #9588

Uh oh!

Conversation

charleskorn commented Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

Uh oh!

jhesketh left a comment

Choose a reason for hiding this comment

Uh oh!

charleskorn commented Oct 14, 2024

Uh oh!

jhesketh left a comment

Choose a reason for hiding this comment

Uh oh!

charleskorn commented Oct 14, 2024

Uh oh!

Uh oh!

Uh oh!

MQE: don't reuse `FloatHistogram` instances for successive points in instant vector selectors #9588

MQE: don't reuse `FloatHistogram` instances for successive points in instant vector selectors #9588

charleskorn commented Oct 11, 2024 •

edited

Loading