Skip to content

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Oct 3, 2024

What this PR does

This PR adds support for the and binary operation.

Compared to Prometheus' engine, latency is up to 90% lower and peak memory consumption is up to 48% lower in our benchmarks:

                                                                       │  Prometheus   │                Mimir                │
                                                                       │    sec/op     │    sec/op     vs base               │
Query/a_1_and_b_1{l=~'.*[0-4]$'},_instant_query-10                        723.5µ ±  0%   717.8µ ±  1%        ~ (p=0.310 n=6)
Query/a_1_and_b_1{l=~'.*[0-4]$'},_range_query_with_100_steps-10           744.2µ ±  2%   734.0µ ±  4%        ~ (p=0.818 n=6)
Query/a_1_and_b_1{l=~'.*[0-4]$'},_range_query_with_1000_steps-10          1.269m ±  1%   1.130m ±  1%  -10.93% (p=0.002 n=6)
Query/a_100_and_b_100{l=~'.*[0-4]$'},_instant_query-10                    1.613m ± 11%   1.609m ±  2%        ~ (p=0.485 n=6)
Query/a_100_and_b_100{l=~'.*[0-4]$'},_range_query_with_100_steps-10       3.270m ±  2%   2.367m ± 10%  -27.62% (p=0.002 n=6)
Query/a_100_and_b_100{l=~'.*[0-4]$'},_range_query_with_1000_steps-10     17.321m ±  1%   8.414m ±  0%  -51.42% (p=0.002 n=6)
Query/a_2000_and_b_2000{l=~'.*[0-4]$'},_instant_query-10                  17.08m ±  1%   16.97m ±  2%        ~ (p=0.589 n=6)
Query/a_2000_and_b_2000{l=~'.*[0-4]$'},_range_query_with_100_steps-10     51.59m ±  4%   30.85m ±  1%  -40.20% (p=0.002 n=6)
Query/a_2000_and_b_2000{l=~'.*[0-4]$'},_range_query_with_1000_steps-10    392.2m ±  1%   146.9m ±  9%  -62.55% (p=0.002 n=6)
Query/a_1_and_b_1{l='notfound'},_instant_query-10                         232.3µ ±  2%   217.7µ ± 12%        ~ (p=0.065 n=6)
Query/a_1_and_b_1{l='notfound'},_range_query_with_100_steps-10            251.5µ ±  7%   218.5µ ±  6%  -13.12% (p=0.002 n=6)
Query/a_1_and_b_1{l='notfound'},_range_query_with_1000_steps-10           382.2µ ±  3%   235.5µ ±  1%  -38.38% (p=0.002 n=6)
Query/a_100_and_b_100{l='notfound'},_instant_query-10                     878.4µ ±  4%   524.2µ ±  5%  -40.32% (p=0.002 n=6)
Query/a_100_and_b_100{l='notfound'},_range_query_with_100_steps-10       1429.2µ ±  0%   570.8µ ±  1%  -60.06% (p=0.002 n=6)
Query/a_100_and_b_100{l='notfound'},_range_query_with_1000_steps-10       6.188m ±  3%   1.039m ±  2%  -83.22% (p=0.002 n=6)
Query/a_2000_and_b_2000{l='notfound'},_instant_query-10                  10.842m ±  1%   5.211m ±  1%  -51.93% (p=0.002 n=6)
Query/a_2000_and_b_2000{l='notfound'},_range_query_with_100_steps-10     21.741m ±  1%   5.582m ±  4%  -74.33% (p=0.002 n=6)
Query/a_2000_and_b_2000{l='notfound'},_range_query_with_1000_steps-10    144.98m ±  2%   12.39m ±  2%  -91.45% (p=0.002 n=6)
geomean                                                                   4.140m         2.229m        -46.17%

                                                                       │  Prometheus   │                Mimir                │
                                                                       │       B       │      B        vs base               │
Query/a_1_and_b_1{l=~'.*[0-4]$'},_instant_query-10                        70.23Mi ± 2%   69.00Mi ± 3%        ~ (p=0.143 n=6)
Query/a_1_and_b_1{l=~'.*[0-4]$'},_range_query_with_100_steps-10           69.90Mi ± 1%   70.33Mi ± 3%        ~ (p=0.589 n=6)
Query/a_1_and_b_1{l=~'.*[0-4]$'},_range_query_with_1000_steps-10          67.97Mi ± 1%   66.34Mi ± 4%        ~ (p=0.087 n=6)
Query/a_100_and_b_100{l=~'.*[0-4]$'},_instant_query-10                    66.05Mi ± 2%   66.66Mi ± 1%        ~ (p=0.420 n=6)
Query/a_100_and_b_100{l=~'.*[0-4]$'},_range_query_with_100_steps-10       66.83Mi ± 1%   66.73Mi ± 1%        ~ (p=0.394 n=6)
Query/a_100_and_b_100{l=~'.*[0-4]$'},_range_query_with_1000_steps-10      71.63Mi ± 1%   69.52Mi ± 2%   -2.96% (p=0.002 n=6)
Query/a_2000_and_b_2000{l=~'.*[0-4]$'},_instant_query-10                  68.12Mi ± 2%   69.08Mi ± 2%        ~ (p=0.310 n=6)
Query/a_2000_and_b_2000{l=~'.*[0-4]$'},_range_query_with_100_steps-10     83.23Mi ± 1%   72.56Mi ± 1%  -12.82% (p=0.002 n=6)
Query/a_2000_and_b_2000{l=~'.*[0-4]$'},_range_query_with_1000_steps-10    198.1Mi ± 1%   105.3Mi ± 4%  -46.83% (p=0.002 n=6)
Query/a_1_and_b_1{l='notfound'},_instant_query-10                         73.99Mi ± 2%   74.73Mi ± 1%        ~ (p=0.180 n=6)
Query/a_1_and_b_1{l='notfound'},_range_query_with_100_steps-10            73.16Mi ± 1%   74.46Mi ± 1%   +1.78% (p=0.004 n=6)
Query/a_1_and_b_1{l='notfound'},_range_query_with_1000_steps-10           70.31Mi ± 1%   73.52Mi ± 1%   +4.57% (p=0.002 n=6)
Query/a_100_and_b_100{l='notfound'},_instant_query-10                     67.06Mi ± 1%   67.49Mi ± 1%        ~ (p=0.093 n=6)
Query/a_100_and_b_100{l='notfound'},_range_query_with_100_steps-10        67.25Mi ± 3%   67.76Mi ± 1%        ~ (p=0.240 n=6)
Query/a_100_and_b_100{l='notfound'},_range_query_with_1000_steps-10       69.25Mi ± 1%   67.46Mi ± 1%   -2.58% (p=0.002 n=6)
Query/a_2000_and_b_2000{l='notfound'},_instant_query-10                   67.02Mi ± 2%   67.88Mi ± 2%        ~ (p=0.180 n=6)
Query/a_2000_and_b_2000{l='notfound'},_range_query_with_100_steps-10      75.27Mi ± 1%   68.97Mi ± 1%   -8.37% (p=0.002 n=6)
Query/a_2000_and_b_2000{l='notfound'},_range_query_with_1000_steps-10    129.24Mi ± 3%   67.18Mi ± 6%  -48.02% (p=0.002 n=6)
geomean                                                                   77.06Mi        70.97Mi        -7.90%

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • [n/a] Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • [n/a] about-versioning.md updated with experimental features.

# Conflicts:
#	cmd/mimir/help-all.txt.tmpl
#	pkg/streamingpromql/config.go
#	pkg/streamingpromql/engine_test.go
@charleskorn charleskorn marked this pull request as ready for review October 3, 2024 06:52
@charleskorn charleskorn requested review from tacole02 and a team as code owners October 3, 2024 06:52
requireQueryIsUnsupported(t, featureToggles, "metric{} or other_metric{}", "binary expression with 'or'")
requireQueryIsUnsupported(t, featureToggles, "metric{} unless other_metric{}", "binary expression with 'unless'")

// Other operations should still be supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit for potential followup) I feel like we could avoid some repetition in these feature flag tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, although I expect there shouldn't be any more cases added here in the future, and these feature flags are temporary, so I'm not sure it's worth the effort.

a.leftSeriesGroups = a.leftSeriesGroups[1:]

if thisSeriesGroup == nil {
// The next series from the left side has no matching series on the right side.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing, because a.Left.NextSeries(ctx) will be the series at the same index as thisSeriesGroup. Really it's "this series", not "the next series".

I suggest to move reading the next series to outside of this check (also then not needing it on line 157). The if condition can still put the series back and continue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed the first part in 26fb606.

I'd prefer not to do the second part (moving the reading of the next left series to one place), as that will mean we're holding more series in memory for longer while we read the right series. It may also mean that downstream operators on the left side are holding more intermediate state while the right side is evaluated, which would again increase peak memory consumption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, instead of reading it to just discard it, what do you think about extending InstantVectorOperator to have a SkipSeries method that moves the selector forward so we don't have to get slices for all the points etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same idea, I've already added something like this to our list of performance improvement ideas 🙂 I think this makes sense, but this is a far bigger change outside the scope of this PR (and I suspect outside the scope of our initial focus of getting MQE up and running).

Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@charleskorn charleskorn requested a review from jhesketh October 4, 2024 02:00
Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks! just one last q

Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I wonder if those re-uses have improved the benchmarks even more :-)

@charleskorn
Copy link
Contributor Author

Thanks. I wonder if those re-uses have improved the benchmarks even more :-)

Not dramatically - reusing the left series slice for data improved things by 1-2%, and reusing the metadata slice made no difference (within the margin of error of the benchmarks).

@charleskorn charleskorn merged commit 665bb0f into main Oct 4, 2024
29 checks passed
@charleskorn charleskorn deleted the charleskorn/mqe-and branch October 4, 2024 03:57
@jhesketh jhesketh mentioned this pull request Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants