Skip to content

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Jul 30, 2025

What this PR does

In #11189, we added common subexpression elimination (CSE) to MQE's query planner. However, this only supported expressions that produced instant vectors - so this would apply to expressions like sum(foo) in sum(foo) + sum(foo) or rate(foo[5m]) in rate(foo[5m]) + rate(foo[5m]).

In this PR, I'm adding support for expressions that produce range vectors, provided the parent query is an instant query. For example, CSE will now apply to foo[5m] in sum_over_time(foo[5m]) / count_over_time(foo[5m]).

I've limited this to instant queries for the time being as supporting range queries adds a lot of complexity. The vast majority of queries we see in production Mimir clusters are instant queries from rules, so we get most of the benefit without the extra complexity.

Benchmarks show a 40-50% latency improvement for affected queries, albeit at the cost of higher peak memory consumption due to the buffering required, similar to what we see for instant vector expressions:

goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/streamingpromql/benchmarks
cpu: Apple M1 Pro
                                                                                                        │  before.txt  │             after.txt              │
                                                                                                        │    sec/op    │   sec/op     vs base               │
Query/sum(sum_over_time(a_1[1m]))_/_sum(count_over_time(a_1[1m])),_instant_query/engine=Mimir-10           274.1µ ± 1%   152.4µ ± 1%  -44.41% (p=0.002 n=6)
Query/sum(sum_over_time(a_100[1m]))_/_sum(count_over_time(a_100[1m])),_instant_query/engine=Mimir-10       838.4µ ± 2%   495.7µ ± 0%  -40.87% (p=0.002 n=6)
Query/sum(sum_over_time(a_2000[1m]))_/_sum(count_over_time(a_2000[1m])),_instant_query/engine=Mimir-10     8.555m ± 1%   5.356m ± 0%  -37.40% (p=0.002 n=6)
Query/sum(sum_over_time(a_1[1d]))_/_sum(count_over_time(a_1[1d])),_instant_query/engine=Mimir-10          1136.0µ ± 1%   606.9µ ± 2%  -46.58% (p=0.002 n=6)
Query/sum(sum_over_time(a_100[1d]))_/_sum(count_over_time(a_100[1d])),_instant_query/engine=Mimir-10       68.34m ± 1%   36.16m ± 2%  -47.09% (p=0.002 n=6)
Query/sum(sum_over_time(a_2000[1d]))_/_sum(count_over_time(a_2000[1d])),_instant_query/engine=Mimir-10    1306.9m ± 0%   666.7m ± 1%  -48.98% (p=0.002 n=6)

                                                                                                        │  before.txt  │               after.txt               │
                                                                                                        │      B       │       B        vs base                │
Query/sum(sum_over_time(a_1[1m]))_/_sum(count_over_time(a_1[1m])),_instant_query/engine=Mimir-10          56.76Mi ± 1%    56.53Mi ± 0%         ~ (p=0.223 n=6)
Query/sum(sum_over_time(a_100[1m]))_/_sum(count_over_time(a_100[1m])),_instant_query/engine=Mimir-10      56.73Mi ± 1%    56.52Mi ± 1%         ~ (p=0.509 n=6)
Query/sum(sum_over_time(a_2000[1m]))_/_sum(count_over_time(a_2000[1m])),_instant_query/engine=Mimir-10    58.97Mi ± 1%    59.06Mi ± 1%         ~ (p=0.420 n=6)
Query/sum(sum_over_time(a_1[1d]))_/_sum(count_over_time(a_1[1d])),_instant_query/engine=Mimir-10          62.27Mi ± 1%    62.34Mi ± 1%         ~ (p=0.554 n=6)
Query/sum(sum_over_time(a_100[1d]))_/_sum(count_over_time(a_100[1d])),_instant_query/engine=Mimir-10      69.50Mi ± 3%   116.27Mi ± 3%   +67.30% (p=0.002 n=6)
Query/sum(sum_over_time(a_2000[1d]))_/_sum(count_over_time(a_2000[1d])),_instant_query/engine=Mimir-10    75.62Mi ± 4%   430.17Mi ± 0%  +468.88% (p=0.002 n=6)

Benchmarks show no noticable impact for the existing instant vector implementation due to making SeriesDataRingBuffer generic.

Which issue(s) this PR fixes or relates to

#11189

Checklist

  • Tests updated.
  • [n/a] Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • [n/a] about-versioning.md updated with experimental features.

Copy link
Contributor

github-actions bot commented Jul 30, 2025

💻 Deploy preview deleted.

@charleskorn charleskorn force-pushed the charleskorn/mqe-cse-range-vector-selectors branch from ceb5ea8 to 20c0253 Compare July 30, 2025 07:06
Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable approach to me. The only feedback I have (which might be too vague to be actionable) is that the code for RangeVectorDuplicationBuffer involves a lot of operations indexing into slices and checking for specific sentinel values. Maybe it's possible to move currentSeriesIndex and hasReadCurrentSeriesSamples (and maybe other members) into a separate structure that doesn't require callers to keep track of slice indexes?


b.seriesMetadataCount++

if b.seriesMetadataCount == b.consumerCount {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're nil-ing out the series metadata here, does that imply that we can't add anymore consumers after doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's right - we expect that all consumers have been created before any calls SeriesMetadata. At least at the moment, this is always true - all consumers will be created when the plan is materialised into operators before evaluation begins.

@charleskorn
Copy link
Contributor Author

The only feedback I have (which might be too vague to be actionable) is that the code for RangeVectorDuplicationBuffer involves a lot of operations indexing into slices and checking for specific sentinel values. Maybe it's possible to move currentSeriesIndex and hasReadCurrentSeriesSamples (and maybe other members) into a separate structure that doesn't require callers to keep track of slice indexes?

Good point, how's what I've got in 75af8c3?

@charleskorn charleskorn marked this pull request as ready for review July 31, 2025 01:20
@charleskorn charleskorn requested review from tacole02 and a team as code owners July 31, 2025 01:20
Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look good! A few very minor nits.

@56quarters
Copy link
Contributor

The only feedback I have (which might be too vague to be actionable) is that the code for RangeVectorDuplicationBuffer involves a lot of operations indexing into slices and checking for specific sentinel values. Maybe it's possible to move currentSeriesIndex and hasReadCurrentSeriesSamples (and maybe other members) into a separate structure that doesn't require callers to keep track of slice indexes?

Good point, how's what I've got in 75af8c3?

Looks good. Thanks!

Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with only minor comments. Really appreciate the thorough tests for the RangeVectorDuplicationBuffer.

@charleskorn charleskorn enabled auto-merge (squash) August 1, 2025 00:54
@charleskorn charleskorn force-pushed the charleskorn/mqe-cse-range-vector-selectors branch from e12777c to 394b086 Compare August 1, 2025 00:54
@charleskorn charleskorn force-pushed the charleskorn/mqe-cse-range-vector-selectors branch from 394b086 to c99f73a Compare August 1, 2025 00:55
@charleskorn charleskorn merged commit 1aafdf7 into main Aug 1, 2025
33 checks passed
@charleskorn charleskorn deleted the charleskorn/mqe-cse-range-vector-selectors branch August 1, 2025 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants