-
Notifications
You must be signed in to change notification settings - Fork 9.7k
promql: reuse histogramStatsIterator
where possible, and expose it for other implementations to use
#16686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
promql: reuse histogramStatsIterator
where possible, and expose it for other implementations to use
#16686
Conversation
Signed-off-by: Charles Korn <charles.korn@grafana.com>
Signed-off-by: Charles Korn <charles.korn@grafana.com>
@fpetkovski might be interested in this… (and the rest of us might value his verdict ;). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, but you need to pull in the current main with #16682 merged, resolve the conflicts, and then check if things still work. (As said in the comments, I have a few suspicions.)
# Conflicts: # promql/histogram_stats_iterator.go
Signed-off-by: Charles Korn <charles.korn@grafana.com>
Signed-off-by: Charles Korn <charles.korn@grafana.com>
Signed-off-by: Charles Korn <charles.korn@grafana.com>
#### What this PR does This PR implements an optimisation inspired by prometheus/prometheus#14097. I've implemented this as a MQE optimisation pass (rather than reusing the logic from that Prometheus PR) for two reasons: * It helps prove out the structure of optimisation passes, and demonstrate they're applicable to more than just common subexpression elimination. * We need to consider how this optimisation interacts with common subexpression elimination. Specifically, we need to consider all the different paths to a selector after common subexpression elimination has been performed to know if we can skip decoding histogram buckets. Alternatively, if we were to reuse the logic from Prometheus, the common subexpression elimination pass would need to know how to merge selectors that are identical save for the `SkipHistogramBuckets` hint. Peak memory consumption is reduced by ~10% in our benchmarks. Latency improves by up to 25% in some cases, but regresses by a small amount (in both absolute and relative terms) for some instant queries: ``` goos: darwin goarch: arm64 pkg: github.com/grafana/mimir/pkg/streamingpromql/benchmarks cpu: Apple M1 Pro │ original.txt │ final.txt │ │ sec/op │ sec/op vs base │ Query/histogram_count(sum(rate(nh_1[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 244.8µ ± 4% 236.4µ ± 1% -3.40% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_1[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 291.7µ ± 4% 261.4µ ± 2% -10.40% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_1[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 791.2µ ± 1% 627.4µ ± 2% -20.70% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 7.719m ± 1% 7.948m ± 0% +2.97% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 11.85m ± 1% 10.64m ± 1% -10.25% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 54.10m ± 3% 41.61m ± 2% -23.08% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 146.2m ± 5% 150.3m ± 0% ~ (p=0.065 n=6) Query/histogram_count(sum(rate(nh_2000[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 223.9m ± 3% 202.3m ± 0% -9.63% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 1047.5m ± 3% 789.9m ± 1% -24.60% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_1[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 264.6µ ± 4% 243.7µ ± 1% -7.92% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_1[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 359.7µ ± 4% 320.3µ ± 2% -10.96% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_1[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 1.295m ± 3% 1.089m ± 0% -15.92% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 8.048m ± 1% 8.472m ± 1% +5.26% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 16.53m ± 2% 15.48m ± 4% -6.34% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 100.81m ± 1% 84.79m ± 0% -15.89% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 151.2m ± 0% 160.0m ± 0% +5.83% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 314.8m ± 0% 295.9m ± 1% -6.00% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 1.968 ± 1% 1.660 ± 0% -15.64% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 244.1µ ± 2% 230.4µ ± 2% -5.60% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 289.1µ ± 2% 261.0µ ± 1% -9.70% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 783.2µ ± 1% 623.1µ ± 0% -20.44% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 7.820m ± 2% 7.937m ± 1% +1.49% (p=0.041 n=6) Query/histogram_sum(sum(rate(nh_100[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 11.95m ± 1% 10.57m ± 1% -11.52% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 54.67m ± 1% 41.65m ± 0% -23.83% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 146.8m ± 1% 150.1m ± 0% +2.27% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 224.9m ± 0% 200.4m ± 1% -10.90% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 1050.1m ± 1% 790.5m ± 1% -24.72% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 258.7µ ± 2% 243.4µ ± 2% -5.91% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 353.7µ ± 3% 317.4µ ± 2% -10.27% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 1.290m ± 1% 1.084m ± 1% -15.93% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 8.085m ± 0% 8.471m ± 1% +4.78% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 16.58m ± 6% 15.49m ± 1% -6.59% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 101.78m ± 3% 84.98m ± 2% -16.50% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 152.6m ± 2% 160.2m ± 2% +4.99% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 317.7m ± 1% 296.4m ± 1% -6.70% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 1.980 ± 1% 1.654 ± 0% -16.49% (p=0.002 n=6) geomean 15.01m 13.55m -9.76% │ original.txt │ final.txt │ │ B │ B vs base │ Query/histogram_count(sum(rate(nh_1[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 76.69Mi ± 5% 68.97Mi ± 5% -10.07% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_1[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 74.16Mi ± 2% 66.49Mi ± 1% -10.34% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_1[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 73.48Mi ± 1% 64.20Mi ± 2% -12.63% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 71.55Mi ± 1% 62.59Mi ± 3% -12.52% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 69.62Mi ± 1% 62.04Mi ± 2% -10.89% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 71.49Mi ± 1% 63.12Mi ± 1% -11.71% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 75.71Mi ± 2% 67.80Mi ± 2% -10.44% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 75.67Mi ± 2% 67.84Mi ± 2% -10.36% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 79.64Mi ± 1% 72.62Mi ± 1% -8.81% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_1[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 75.88Mi ± 2% 67.17Mi ± 2% -11.48% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_1[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 73.75Mi ± 2% 66.02Mi ± 4% -10.49% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_1[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 73.00Mi ± 1% 64.12Mi ± 2% -12.16% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 70.35Mi ± 2% 63.04Mi ± 2% -10.39% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 69.81Mi ± 1% 61.73Mi ± 1% -11.57% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_100[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 71.08Mi ± 1% 62.70Mi ± 1% -11.78% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 74.77Mi ± 3% 67.80Mi ± 3% -9.32% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 75.47Mi ± 1% 66.90Mi ± 2% -11.36% (p=0.002 n=6) Query/histogram_count(sum(rate(nh_2000[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 79.35Mi ± 1% 70.52Mi ± 2% -11.14% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 76.41Mi ± 2% 69.45Mi ± 4% -9.10% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 75.35Mi ± 3% 66.16Mi ± 2% -12.19% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 72.92Mi ± 1% 63.72Mi ± 1% -12.62% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 71.88Mi ± 2% 62.25Mi ± 1% -13.40% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 70.02Mi ± 1% 61.74Mi ± 1% -11.83% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 71.70Mi ± 1% 63.26Mi ± 1% -11.78% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[2m]))),_instant_query/engine=MimirWithQueryPlanner-10 74.36Mi ± 2% 66.23Mi ± 2% -10.94% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[2m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 75.64Mi ± 1% 67.31Mi ± 1% -11.01% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[2m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 79.44Mi ± 1% 72.38Mi ± 1% -8.89% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 75.46Mi ± 4% 67.14Mi ± 2% -11.03% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 73.41Mi ± 3% 65.93Mi ± 3% -10.19% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_1[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 72.95Mi ± 1% 63.78Mi ± 1% -12.57% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 71.73Mi ± 2% 63.17Mi ± 2% -11.93% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 69.86Mi ± 1% 61.66Mi ± 1% -11.73% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_100[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 71.34Mi ± 1% 62.35Mi ± 2% -12.60% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[20m]))),_instant_query/engine=MimirWithQueryPlanner-10 75.13Mi ± 1% 67.05Mi ± 3% -10.75% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[20m]))),_range_query_with_100_steps/engine=MimirWithQueryPlanner-10 75.03Mi ± 1% 67.27Mi ± 2% -10.35% (p=0.002 n=6) Query/histogram_sum(sum(rate(nh_2000[20m]))),_range_query_with_1000_steps/engine=MimirWithQueryPlanner-10 78.70Mi ± 2% 70.84Mi ± 1% -9.98% (p=0.002 n=6) geomean 73.92Mi 65.69Mi -11.13% ```⚠️ This PR is pending prometheus/prometheus#16686 being merged and vendored into Mimir. #### Which issue(s) this PR fixes or relates to #10067 #### Checklist - [x] Tests updated. - [n/a] Documentation added. - [covered by #10067] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. - [n/a] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features.
This PR builds upon #14097.
It makes two changes:
histogramStatsIterator
type so that other query engine implementations can use it.Reset
method that allows the iterator to be reused, reducing allocations and improving performance.The existing benchmarks show a 2-4% reduction in allocations:
The existing benchmarks don't show a noticeable improvement in latency as the benchmarks run a range query with 240 steps over 3000 series, and so the benchmark time is dominated by other things.
Modifying the existing benchmarks to run an instant query shows more dramatic results: