Skip to content

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Feb 10, 2025

What this PR does

This PR adds support for topk and bottomk to MQE.

Note that I have not added topk and bottomk to the mixed metrics gauntlet as there are many false-positive failures: topk and bottomk do not guarantee which series will be returned when multiple series in a group have the same value, and due to differences in Prometheus' engine and MQE, they each select different series in these cases.

Compared to Prometheus' engine, MQE is up to 30% faster and uses up to 70% less memory at peak for non-trivial benchmarks:

                                                               │  Prometheus  │               Mimir                │
                                                               │    sec/op    │   sec/op     vs base               │
Query/topk(1,_a_1),_instant_query-10                              147.3µ ± 5%   144.6µ ± 1%   -1.83% (p=0.026 n=6)
Query/topk(1,_a_1),_range_query_with_100_steps-10                 175.5µ ± 1%   157.6µ ± 2%  -10.20% (p=0.002 n=6)
Query/topk(1,_a_1),_range_query_with_1000_steps-10                462.8µ ± 2%   280.9µ ± 0%  -39.31% (p=0.002 n=6)
Query/topk(1,_a_100),_instant_query-10                            807.0µ ± 2%   776.8µ ± 1%   -3.75% (p=0.002 n=6)
Query/topk(1,_a_100),_range_query_with_100_steps-10               1.388m ± 0%   1.378m ± 1%   -0.70% (p=0.015 n=6)
Query/topk(1,_a_100),_range_query_with_1000_steps-10              6.246m ± 1%   6.294m ± 0%        ~ (p=0.065 n=6)
Query/topk(1,_a_2000),_instant_query-10                           10.89m ± 2%   10.40m ± 1%   -4.53% (p=0.002 n=6)
Query/topk(1,_a_2000),_range_query_with_100_steps-10              21.03m ± 1%   20.83m ± 1%        ~ (p=0.065 n=6)
Query/topk(1,_a_2000),_range_query_with_1000_steps-10             134.6m ± 1%   110.3m ± 0%  -18.05% (p=0.002 n=6)
Query/topk(5,_a_1),_instant_query-10                              146.2µ ± 1%   140.8µ ± 2%   -3.69% (p=0.002 n=6)
Query/topk(5,_a_1),_range_query_with_100_steps-10                 175.7µ ± 1%   157.4µ ± 1%  -10.44% (p=0.002 n=6)
Query/topk(5,_a_1),_range_query_with_1000_steps-10                462.4µ ± 3%   279.4µ ± 1%  -39.57% (p=0.002 n=6)
Query/topk(5,_a_100),_instant_query-10                            811.7µ ± 1%   783.2µ ± 1%   -3.51% (p=0.002 n=6)
Query/topk(5,_a_100),_range_query_with_100_steps-10               1.626m ± 3%   1.576m ± 1%   -3.03% (p=0.002 n=6)
Query/topk(5,_a_100),_range_query_with_1000_steps-10              8.576m ± 1%   8.296m ± 6%        ~ (p=0.065 n=6)
Query/topk(5,_a_2000),_instant_query-10                           10.89m ± 1%   10.47m ± 1%   -3.85% (p=0.002 n=6)
Query/topk(5,_a_2000),_range_query_with_100_steps-10              23.21m ± 1%   23.14m ± 2%        ~ (p=0.240 n=6)
Query/topk(5,_a_2000),_range_query_with_1000_steps-10             179.0m ± 2%   132.7m ± 0%  -25.87% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_1),_instant_query-10                     193.4µ ± 2%   184.8µ ± 2%   -4.41% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_1),_range_query_with_100_steps-10        396.9µ ± 3%   261.8µ ± 1%  -34.04% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_1),_range_query_with_1000_steps-10      2016.5µ ± 3%   898.6µ ± 1%  -55.44% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_100),_instant_query-10                   3.844m ± 2%   3.676m ± 1%   -4.37% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_100),_range_query_with_100_steps-10      8.550m ± 1%   8.336m ± 2%   -2.50% (p=0.015 n=6)
Query/topk_by_(le)_(5,_h_100),_range_query_with_1000_steps-10     49.95m ± 2%   47.92m ± 0%   -4.08% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_2000),_instant_query-10                  68.97m ± 2%   66.73m ± 2%   -3.26% (p=0.004 n=6)
Query/topk_by_(le)_(5,_h_2000),_range_query_with_100_steps-10     157.3m ± 1%   145.7m ± 1%   -7.38% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_2000),_range_query_with_1000_steps-10   1169.4m ± 0%   795.4m ± 0%  -31.98% (p=0.002 n=6)
geomean                                                           4.156m        3.599m       -13.41%


                                                               │  Prometheus   │                Mimir                │
                                                               │       B       │      B        vs base               │
Query/topk(1,_a_1),_instant_query-10                              66.27Mi ± 1%   65.15Mi ± 1%   -1.70% (p=0.009 n=6)
Query/topk(1,_a_1),_range_query_with_100_steps-10                 64.23Mi ± 1%   64.72Mi ± 2%   +0.77% (p=0.041 n=6)
Query/topk(1,_a_1),_range_query_with_1000_steps-10                63.00Mi ± 1%   64.52Mi ± 1%   +2.42% (p=0.002 n=6)
Query/topk(1,_a_100),_instant_query-10                            61.68Mi ± 1%   60.53Mi ± 2%   -1.86% (p=0.009 n=6)
Query/topk(1,_a_100),_range_query_with_100_steps-10               62.03Mi ± 1%   60.78Mi ± 1%   -2.02% (p=0.004 n=6)
Query/topk(1,_a_100),_range_query_with_1000_steps-10              64.65Mi ± 1%   61.77Mi ± 1%   -4.46% (p=0.002 n=6)
Query/topk(1,_a_2000),_instant_query-10                           63.16Mi ± 2%   62.32Mi ± 1%   -1.34% (p=0.017 n=6)
Query/topk(1,_a_2000),_range_query_with_100_steps-10              70.54Mi ± 1%   63.90Mi ± 1%   -9.41% (p=0.002 n=6)
Query/topk(1,_a_2000),_range_query_with_1000_steps-10            128.05Mi ± 2%   68.60Mi ± 1%  -46.43% (p=0.002 n=6)
Query/topk(5,_a_1),_instant_query-10                              66.30Mi ± 1%   65.70Mi ± 1%        ~ (p=0.195 n=6)
Query/topk(5,_a_1),_range_query_with_100_steps-10                 64.67Mi ± 1%   65.26Mi ± 1%   +0.91% (p=0.026 n=6)
Query/topk(5,_a_1),_range_query_with_1000_steps-10                62.91Mi ± 1%   64.60Mi ± 1%   +2.69% (p=0.002 n=6)
Query/topk(5,_a_100),_instant_query-10                            61.24Mi ± 1%   61.05Mi ± 1%   -0.31% (p=0.006 n=6)
Query/topk(5,_a_100),_range_query_with_100_steps-10               62.09Mi ± 0%   60.99Mi ± 2%   -1.76% (p=0.002 n=6)
Query/topk(5,_a_100),_range_query_with_1000_steps-10              64.75Mi ± 1%   62.00Mi ± 1%   -4.25% (p=0.002 n=6)
Query/topk(5,_a_2000),_instant_query-10                           63.61Mi ± 1%   62.17Mi ± 1%   -2.26% (p=0.009 n=6)
Query/topk(5,_a_2000),_range_query_with_100_steps-10              71.10Mi ± 2%   63.48Mi ± 1%  -10.71% (p=0.002 n=6)
Query/topk(5,_a_2000),_range_query_with_1000_steps-10            126.81Mi ± 1%   68.02Mi ± 2%  -46.37% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_1),_instant_query-10                     64.33Mi ± 1%   63.59Mi ± 1%   -1.15% (p=0.045 n=6)
Query/topk_by_(le)_(5,_h_1),_range_query_with_100_steps-10        62.95Mi ± 1%   63.40Mi ± 2%        ~ (p=0.240 n=6)
Query/topk_by_(le)_(5,_h_1),_range_query_with_1000_steps-10       64.27Mi ± 1%   62.56Mi ± 2%   -2.66% (p=0.004 n=6)
Query/topk_by_(le)_(5,_h_100),_instant_query-10                   61.88Mi ± 1%   61.54Mi ± 1%        ~ (p=0.180 n=6)
Query/topk_by_(le)_(5,_h_100),_range_query_with_100_steps-10      65.20Mi ± 1%   63.95Mi ± 1%   -1.92% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_100),_range_query_with_1000_steps-10     83.88Mi ± 1%   67.52Mi ± 1%  -19.50% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_2000),_instant_query-10                  70.71Mi ± 1%   66.52Mi ± 1%   -5.93% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_2000),_range_query_with_100_steps-10    112.00Mi ± 1%   71.28Mi ± 1%  -36.36% (p=0.002 n=6)
Query/topk_by_(le)_(5,_h_2000),_range_query_with_1000_steps-10   265.25Mi ± 1%   75.28Mi ± 3%  -71.62% (p=0.002 n=6)
geomean                                                           73.73Mi        64.41Mi       -12.64%

Which issue(s) this PR fixes or relates to

#10067

Checklist

  • Tests updated.
  • [n/a] Documentation added.
  • [covered by Mimir Query Engine #10067] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • [n/a] about-versioning.md updated with experimental features.

@charleskorn charleskorn force-pushed the charleskorn/mqe-topk-bottomk branch from 858b84a to a6aa656 Compare February 10, 2025 05:43
@charleskorn charleskorn marked this pull request as ready for review February 12, 2025 03:14
@charleskorn charleskorn requested a review from a team as a code owner February 12, 2025 03:14
Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put this through the gauntlet, at least for instant queries to confirm the order.

We should also have a feature flag for this operator since it doesn't reuse most of the aggregation operator.

(I need to review range_query a bit closer still, but at first glance looks good)

// whereas this method does.
// This saves us allocating a new buffer and string for every single input series, which has a noticeable performance impact.
type seriesToGroupLabelsBytesFunc func(labels.Labels) []byte
func (a *Aggregation) groupLabelsBytesFunc() SeriesToGroupLabelsBytesFunc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a method any more. We should be able to just use GroupLabelsBytesFunc on line 138 (same with the new operators too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this in the new operators, but TestAggregation_GroupLabelling uses this method, so I'd prefer to keep it as-is in Aggregation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not blocking) Why not update those tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it's more valuable to test the Aggregation operator itself, as it does some work to make these tests pass (eg. adding the __name__ label to the list of labels when without is used) - the fact it uses a shared function is more of an implementation detail.

@charleskorn
Copy link
Contributor Author

We should put this through the gauntlet, at least for instant queries to confirm the order.

I did try this, but it would not be straightforward:

Note that I have not added topk and bottomk to the mixed metrics gauntlet as there are many false-positive failures: topk and bottomk do not guarantee which series will be returned when multiple series in a group have the same value, and due to differences in Prometheus' engine and MQE, they each select different series in these cases.

@charleskorn
Copy link
Contributor Author

We should also have a feature flag for this operator since it doesn't reuse most of the aggregation operator.

Isn't this covered by -querier.mimir-query-engine.disabled-aggregations here?

@charleskorn charleskorn force-pushed the charleskorn/mqe-topk-bottomk branch from 62f545c to f49c2b8 Compare February 13, 2025 02:47
@charleskorn charleskorn requested a review from jhesketh February 13, 2025 03:42
@jhesketh
Copy link
Contributor

We should put this through the gauntlet, at least for instant queries to confirm the order.

I did try this, but it would not be straightforward:

Note that I have not added topk and bottomk to the mixed metrics gauntlet as there are many false-positive failures: topk and bottomk do not guarantee which series will be returned when multiple series in a group have the same value, and due to differences in Prometheus' engine and MQE, they each select different series in these cases.

Sorry I missed that.

I wonder if we can create a similar gauntlet for these aggregations that only has deterministic combinations? It'd still be good to test more combinations such as stale markers and infinities etc.

@jhesketh
Copy link
Contributor

We should also have a feature flag for this operator since it doesn't reuse most of the aggregation operator.

Isn't this covered by -querier.mimir-query-engine.disabled-aggregations here?

Heh, yep, that works.

@charleskorn
Copy link
Contributor Author

I wonder if we can create a similar gauntlet for these aggregations that only has deterministic combinations? It'd still be good to test more combinations such as stale markers and infinities etc.

topk and bottomk should never see stale markers, as they're filtered out by instant vector selectors. They also ignore histograms.

We already have some tests that cover the behaviour of topk and bottomk with infinity and NaN. I don't think the value of adding topk and bottomk to the existing gauntlet (or creating another similar test) outweighs the cost of building, maintaining and running them - the gauntlet tests are becoming slower and slower, and that has an impact on engineer productivity and CI run times.

Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks! Just a last few things

return nil
}

if len(g.seriesForTimestamps[timestampIndex]) == int(limit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) point selection/interacting with the heap could be split out into own method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the other methods out in 54bba6e, which should help make this a little shorter - how's this now?

Copy link
Contributor

@jhesketh jhesketh Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel this if/else block could be extracted so we have accumulateIntoGroup dealing with getting the series for the group, and a new method such as updateHeapForGroup or similar for dealing with the heap selection etc. (again, jsut a nit)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #10679

Copy link
Contributor

@jhesketh jhesketh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for dealing with my feedback :-)

@charleskorn
Copy link
Contributor Author

I'm going to merge this as-is so we can start testing it next week, but I will come back and address some of the refactoring as a follow-up PR.

@charleskorn charleskorn merged commit 33b6dc5 into main Feb 14, 2025
28 checks passed
@charleskorn charleskorn deleted the charleskorn/mqe-topk-bottomk branch February 14, 2025 06:02
ying-jeanne pushed a commit that referenced this pull request Feb 19, 2025
* Enable upstream test cases

* Enable benchmark

* Add our own test cases

* Initial implementation

* Fix issue where queries that use `count_values` fail rather than reporting that they are unsupported

* Add benchmark with grouping

* Add tests for grouping and sorting

* Reuse same heap instance to avoid allocating on every input sample.

* Add shortcut for case where k=1

* Pool slices of `topKBottomKSeries`

* Pool `[][]int` slices

* Move to separate package

* Initial implementation of sorting for instant queries

* Fix sorting of NaN values for instant queries

* Expand test cases to cover instant queries as well

* Add license headers

* Address PR feedback: don't bother allocating a new slice when used with `without`

* Address PR feedback related to `GroupLabelsBytesFunc`

* Address PR feedback: refactor `GroupLabelsBytesFunc`

* Address PR feedback: add and clarify comments

* Address PR feedback: rename `limit` to `k`

* Rename `topKBottomKSeries` to `rangeQuerySeries`

* Add pooling for `[]instantQuerySeries`

* Address PR feedback: add further test cases with NaN and infinite values

* Add test for instant query sorting and grouping

* Address PR feedback: clarify comments

* Address PR feedback: clarify error message

* Address PR feedback: extract functions
ying-jeanne pushed a commit that referenced this pull request Feb 20, 2025
* Enable upstream test cases

* Enable benchmark

* Add our own test cases

* Initial implementation

* Fix issue where queries that use `count_values` fail rather than reporting that they are unsupported

* Add benchmark with grouping

* Add tests for grouping and sorting

* Reuse same heap instance to avoid allocating on every input sample.

* Add shortcut for case where k=1

* Pool slices of `topKBottomKSeries`

* Pool `[][]int` slices

* Move to separate package

* Initial implementation of sorting for instant queries

* Fix sorting of NaN values for instant queries

* Expand test cases to cover instant queries as well

* Add license headers

* Address PR feedback: don't bother allocating a new slice when used with `without`

* Address PR feedback related to `GroupLabelsBytesFunc`

* Address PR feedback: refactor `GroupLabelsBytesFunc`

* Address PR feedback: add and clarify comments

* Address PR feedback: rename `limit` to `k`

* Rename `topKBottomKSeries` to `rangeQuerySeries`

* Add pooling for `[]instantQuerySeries`

* Address PR feedback: add further test cases with NaN and infinite values

* Add test for instant query sorting and grouping

* Address PR feedback: clarify comments

* Address PR feedback: clarify error message

* Address PR feedback: extract functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants