-
Notifications
You must be signed in to change notification settings - Fork 634
MQE: add support for count_values
#10744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
|
||
for _, p := range data.Floats { | ||
if err := c.incrementCount(s.Labels, p.T, c.formatFloatValue(p.F), accumulator); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would caching c.formatFloatValue(p.F)
going to be beneficial? I will imagine if we have many same values across the result, this will be formatted multiple times for that same values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks show this implementation is already significantly faster than Prometheus' engine for the case where there are many series with the same value, and caching like this would make the worst case (many series with different values) slower, so I'm tempted to leave things as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 🚀
What this PR does
This PR adds support for
count_values
in MQE.Benchmark results are mixed. MQE's implementation always consumes less memory than Prometheus' engine in our benchmarks, but in some cases runs slower, and in others runs faster. MQE is up to 36% faster for the common case where many series have only a handful of values, but up to 20% slower for the uncommon case where every sample has a different value.
The
count_values('value', h_X)
cases exercise the scenario where every sample has a different value, resulting inX × 100
output series. Thecount_values('value', h_X * 0)
cases exercise the scenario where every sample has the same value, resulting in 1 output series.The difference in performance for the uncommon case is because Prometheus' engine uses a hash of the series labels to identify output series, which runs the risk of hash collisions, while MQE uses a string representation of the series labels to identify output series. In the case where there are many output series, this means MQE must allocate many strings, which causes the performance impact we see here.
Given
count_values
is a relatively uncommonly used PromQL feature, the performance of the common case is significantly better than Prometheus' engine, and MQE's implementation avoids possible hash collisions, I'm OK with this performance regression.Which issue(s) this PR fixes or relates to
#10067
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.