-
Notifications
You must be signed in to change notification settings - Fork 634
MQE: don't reuse FloatHistogram instances for successive points in instant vector selectors
#9588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…amples in a series
…tHistograms that have been used multiple times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the rationale of this. I do wonder though if we can measure the real-world impact. Should we have a metric on number of copied samples? We can see if that rises too high if we should reconsider reimplementing such an optimisation down the road (although it may be a bigger risk to do later)
We should be able to see this on profiles, including the continuous profiling we run on Mimir instances in Grafana Cloud. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. Are we adequately testing the series merging to catch the latest clear?
I've added a test to ensure |
What this PR does
This PR is an iteration on #8677. It changes the behaviour of instant vector selectors to not return the same
FloatHistograminstance for successive points if they all select the same histogram.This preserves the main optimisation of #8677 (avoiding creating copies of histograms in
sum) without the complexity and associated bugs of needing to ensure that sharedFloatHistograminstances are modified safely.For example, I just found yet another issue related to this that was introduced in #9507, where shared
FloatHistograminstances could be left in the unused portion of the histogram slice here and then corrupted if theHPointslice was later used for a range vector selector.I believe similar corruption could occur if a
HPointslice was used for an instant vector selector that returned the sameFloatHistogramfor successive points, and then that slice was reused for a range vector selector: the range vector selector would reuse theFloatHistograminstances when filling the buffer here and so all points would have the same value as the last histogram, rather than their expected values.The performance impact of this change is minimal in our benchmarks: less than 1% degradation or less than 5% reduction in latency in all cases. In some cases things get slightly slower due to the extra copies created, and in some cases things get slightly faster due to the simpler logic. I think the simpler logic and far reduced likelihood of bugs make this change worthwhile.
In the real world, where lookback is more common, we'll only need to make copies (and therefore pay the price of those copies) if the same point is used for successive output samples, which would usually only happen if the query step is smaller than the scrape interval. Given most users run with 15s or 60s scrape intervals, and most queries tend to be either instant queries, I don't think this is likely to become a huge issue.
Which issue(s) this PR fixes or relates to
(none)
Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX].about-versioning.mdupdated with experimental features.