-
Notifications
You must be signed in to change notification settings - Fork 642
MQE: add CSE support for range vector expressions in instant queries #12236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…oduce range vectors in instant queries
💻 Deploy preview deleted. |
ceb5ea8
to
20c0253
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable approach to me. The only feedback I have (which might be too vague to be actionable) is that the code for RangeVectorDuplicationBuffer
involves a lot of operations indexing into slices and checking for specific sentinel values. Maybe it's possible to move currentSeriesIndex
and hasReadCurrentSeriesSamples
(and maybe other members) into a separate structure that doesn't require callers to keep track of slice indexes?
|
||
b.seriesMetadataCount++ | ||
|
||
if b.seriesMetadataCount == b.consumerCount { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're nil-ing out the series metadata here, does that imply that we can't add anymore consumers after doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's right - we expect that all consumers have been created before any calls SeriesMetadata
. At least at the moment, this is always true - all consumers will be created when the plan is materialised into operators before evaluation begins.
…er than multiple slices
Good point, how's what I've got in 75af8c3? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs look good! A few very minor nits.
Looks good. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with only minor comments. Really appreciate the thorough tests for the RangeVectorDuplicationBuffer
.
e12777c
to
394b086
Compare
# Conflicts: # CHANGELOG.md
394b086
to
c99f73a
Compare
What this PR does
In #11189, we added common subexpression elimination (CSE) to MQE's query planner. However, this only supported expressions that produced instant vectors - so this would apply to expressions like
sum(foo)
insum(foo) + sum(foo)
orrate(foo[5m])
inrate(foo[5m]) + rate(foo[5m])
.In this PR, I'm adding support for expressions that produce range vectors, provided the parent query is an instant query. For example, CSE will now apply to
foo[5m]
insum_over_time(foo[5m]) / count_over_time(foo[5m])
.I've limited this to instant queries for the time being as supporting range queries adds a lot of complexity. The vast majority of queries we see in production Mimir clusters are instant queries from rules, so we get most of the benefit without the extra complexity.
Benchmarks show a 40-50% latency improvement for affected queries, albeit at the cost of higher peak memory consumption due to the buffering required, similar to what we see for instant vector expressions:
Benchmarks show no noticable impact for the existing instant vector implementation due to making
SeriesDataRingBuffer
generic.Which issue(s) this PR fixes or relates to
#11189
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
. If changelog entry is not needed, please add thechangelog-not-needed
label to the PR.about-versioning.md
updated with experimental features.