MQE: add support for `count_values` #10744

charleskorn · 2025-02-26T00:41:16Z

What this PR does

This PR adds support for count_values in MQE.

Benchmark results are mixed. MQE's implementation always consumes less memory than Prometheus' engine in our benchmarks, but in some cases runs slower, and in others runs faster. MQE is up to 36% faster for the common case where many series have only a handful of values, but up to 20% slower for the uncommon case where every sample has a different value.

The count_values('value', h_X) cases exercise the scenario where every sample has a different value, resulting in X × 100 output series. The count_values('value', h_X * 0) cases exercise the scenario where every sample has the same value, resulting in 1 output series.

goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/streamingpromql/benchmarks
cpu: Apple M1 Pro
                                                                       │ Prometheus  │               Mimir                │
                                                                       │   sec/op    │   sec/op     vs base               │
Query/count_values('value',_h_1),_range_query_with_100_steps-10          833.8µ ± 1%   733.8µ ± 3%  -11.99% (p=0.002 n=6)
Query/count_values('value',_h_100),_range_query_with_100_steps-10        70.70m ± 1%   84.77m ± 4%  +19.91% (p=0.002 n=6)
Query/count_values('value',_h_2000),_range_query_with_100_steps-10        2.300 ± 2%    2.574 ± 2%  +11.92% (p=0.002 n=6)
Query/count_values('value',_h_1_*_0),_range_query_with_100_steps-10      384.5µ ± 2%   266.9µ ± 1%  -30.59% (p=0.002 n=6)
Query/count_values('value',_h_100_*_0),_range_query_with_100_steps-10    19.00m ± 5%   13.54m ± 3%  -28.71% (p=0.002 n=6)
Query/count_values('value',_h_2000_*_0),_range_query_with_100_steps-10   413.2m ± 1%   264.3m ± 1%  -36.04% (p=0.002 n=6)
geomean                                                                  27.25m        23.13m       -15.12%

                                                                       │  Prometheus   │                Mimir                 │
                                                                       │       B       │       B        vs base               │
Query/count_values('value',_h_1),_range_query_with_100_steps-10           64.55Mi ± 1%   65.10Mi ±  1%        ~ (p=0.074 n=6)
Query/count_values('value',_h_100),_range_query_with_100_steps-10         298.2Mi ± 2%   271.9Mi ± 11%   -8.81% (p=0.002 n=6)
Query/count_values('value',_h_2000),_range_query_with_100_steps-10        2.579Gi ± 1%   1.834Gi ±  1%  -28.88% (p=0.002 n=6)
Query/count_values('value',_h_1_*_0),_range_query_with_100_steps-10       63.30Mi ± 1%   63.24Mi ±  1%        ~ (p=0.729 n=6)
Query/count_values('value',_h_100_*_0),_range_query_with_100_steps-10     67.12Mi ± 1%   61.92Mi ±  2%   -7.74% (p=0.002 n=6)
Query/count_values('value',_h_2000_*_0),_range_query_with_100_steps-10   149.60Mi ± 3%   66.44Mi ±  1%  -55.59% (p=0.002 n=6)
geomean                                                                   178.5Mi        143.3Mi        -19.72%

The difference in performance for the uncommon case is because Prometheus' engine uses a hash of the series labels to identify output series, which runs the risk of hash collisions, while MQE uses a string representation of the series labels to identify output series. In the case where there are many output series, this means MQE must allocate many strings, which causes the performance impact we see here.

Given count_values is a relatively uncommonly used PromQL feature, the performance of the common case is significantly better than Prometheus' engine, and MQE's implementation avoids possible hash collisions, I'm OK with this performance regression.

Which issue(s) this PR fixes or relates to

#10067

Checklist

Tests updated.
[n/a] Documentation added.
[covered by Mimir Query Engine #10067] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
[n/a] about-versioning.md updated with experimental features.

pkg/streamingpromql/operators/aggregations/count_values.go

lamida · 2025-02-27T16:30:36Z

pkg/streamingpromql/operators/aggregations/count_values.go

+		}
+
+		for _, p := range data.Floats {
+			if err := c.incrementCount(s.Labels, p.T, c.formatFloatValue(p.F), accumulator); err != nil {


Would caching c.formatFloatValue(p.F) going to be beneficial? I will imagine if we have many same values across the result, this will be formatted multiple times for that same values.

The benchmarks show this implementation is already significantly faster than Prometheus' engine for the case where there are many series with the same value, and caching like this would make the worst case (many series with different values) slower, so I'm tempted to leave things as-is.

lamida

LGTM

jhesketh

lgtm 🚀

charleskorn added 8 commits February 26, 2025 11:32

Enable upstream tests

5d7166f

Add to gauntlet

ebe05ee

Remove from unsupported expressions test

a04c680

Enable benchmark

b3a09ad

Initial implementation

e9852d4

Avoid allocations when formatting value label where possible

7667cbf

Pool countValuesSeries instances

8783330

Add additional benchmark

2509f73

charleskorn mentioned this pull request Feb 26, 2025

Mimir Query Engine #10067

Open

charleskorn marked this pull request as ready for review February 26, 2025 00:47

charleskorn requested a review from a team as a code owner February 26, 2025 00:47

lamida reviewed Feb 26, 2025

View reviewed changes

pkg/streamingpromql/operators/aggregations/count_values.go Outdated Show resolved Hide resolved

lamida reviewed Feb 27, 2025

View reviewed changes

Address PR feedback: don't export ToPoints

13fbbe8

lamida approved these changes Feb 28, 2025

View reviewed changes

Update comment and move group manipulation to NewCountValues

26a820c

jhesketh approved these changes Mar 3, 2025

View reviewed changes

charleskorn merged commit 0394438 into main Mar 3, 2025
28 checks passed

charleskorn deleted the charleskorn/mqe-count-values branch March 3, 2025 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MQE: add support for `count_values` #10744

MQE: add support for `count_values` #10744

Uh oh!

charleskorn commented Feb 26, 2025

Uh oh!

Uh oh!

lamida Feb 27, 2025

Uh oh!

charleskorn Feb 28, 2025

Uh oh!

lamida left a comment

Uh oh!

jhesketh left a comment

Uh oh!

Uh oh!

Uh oh!

MQE: add support for count_values #10744

MQE: add support for count_values #10744

Uh oh!

Conversation

charleskorn commented Feb 26, 2025

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

Uh oh!

Uh oh!

lamida Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

charleskorn Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

lamida left a comment

Choose a reason for hiding this comment

Uh oh!

jhesketh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MQE: add support for `count_values` #10744

MQE: add support for `count_values` #10744