promql: Simplify avg aggregation and avg_over_time #16569

beorn7 · 2025-05-07T14:01:52Z

As it turns out, if we combine Kahan summation and incremental mean calculation properly, it works quite well and we do not need to switch between simple mean calculation and incremental calculation based on overflow.

This simplifies the code quite a bit.

beorn7 · 2025-05-07T14:02:26Z

Note: This is on top of the beorn7/promql branch to simplify review. I'll rebase once #16566 is merged.

beorn7 · 2025-05-07T14:02:47Z

@crush-on-anechka please have a look.

machine424 · 2025-05-19T21:36:00Z

Thanks for this, I'll look at this.

bboreham · 2025-05-20T10:38:59Z

Having discussed this when the previous code was added, can you comment whether this argument still holds?

machine424 · 2025-05-21T07:26:00Z

Apparently @crush-on-anechka (#15687 (comment)) has some extra test cases that we may want to agree on and add before this change.

beorn7 · 2025-05-29T20:46:47Z

Apparently @crush-on-anechka (#15687 (comment)) has some extra test cases that we may want to agree on and add before this change.

I still have to analyze those extra cases. Hopefully I get to it next week.

beorn7 · 2025-05-29T21:04:47Z

Having discussed this when the previous code was added, can you comment whether this argument still holds?

Back then, I assumed incremental mean calculation is just less accurate than non-incremental. Now it has turned out that we just applied Kahan summation in a non-optimal (or arguably wrong) way. When I created #14413, I did not question the existing implementation. To better understand the line of thinking, I'll give a summary of the history:

The initial implementation used simple mean calculation, but no Kahan summation yet.
One day, somebody ran into an overflow of float64 and had the idea that we should use incremental mean calculation. That was clearly better, so we implemented it.
Some time later, somebody ran into float accuracy problems and thought we should better use Kahan summation. Somebody added Kahan summation to both sum and avg (and their _over_time variants). Nothing broke, everyone was happy. However, at this time, the Kahan summation wasn't done in the best possible way (or arguably in a wrong way), which wasn't caught.
Again some time later, somebody noted a scenario where sum worked just fine, but avg didn't. Which was weird because avg is just doing sum first and then divide by a relatively small integer. When I debugged the problem, I blamed the incremental mean calculation as being less accurate, back then assuming that Kahan summation and incremental mean calculation were combined in an appropriate way, and it would be just inherent to the problem that the incremental mean calculation is less accurate, so that the optimal way is to do simple mean calculation if there is no overflow and switch to incremental mean calculation once we run into an overflow. This resulted in promql: more Kahan summation (avg) and less incremental mean calculation (avg, avg_over_time) #14413, which made everyone happy because now the "obvious" case where sum worked but avg not was fixed, while the cases where sum would overflow but avg not (thanks to incremental mean calculation) continued to work as expected.
Again some time later, @crush-on-anechka started to work on Kahan summation for histograms. They raised interesting points, and in the course of the discussion, I started some more fundamental research, which lead me to the revelation that our existing combination of Kahan summation and incremental mean calculation could be improved. Which I did.
With the improved combined Kahan and incremental mean calculation, I could not see any practical differences in accuracy anymore between simple and incremental mean calculation. I don't know the details of the numerical accuracy here well enough to tell if there is maybe still an accuracy advantage of one over the other (it could also be the other way around, of course), but my current thinking is that if there is any, it is probably not very relevant in practice. So I decided to better simplify the code by a good deal (accepting a theoretical increase in compute cost, which again isn't really relevant in practice – remember that we always did incremental calculation historically).

bboreham

Whilst I am happy with the code, I think it would be improved by a little explanation in comments. Perhaps linking to a reference.

machine424 · 2025-06-04T16:30:01Z

unassigned myself so it doesn't look like I'm blocking this, I can always take a look post-merge.

beorn7 · 2025-06-05T21:10:57Z

I have now incorporated @crush-on-anechka's test cases from #15687 (in this comment).

With my proposed fixed, the two most interesting ones are metric10 and metric11:

For metric10, the incremental mean calculation (as introduced in this PR) return a pretty precise result, while the simple mean calculation (prior to this PR) is quite a bit off.
metric11 throws off Kahan summation completely. (Broadly simplified, if you handle large numbers of significantly different orders of magnitudes, they will wash out the small values from the Kahan summation term.) Both simple and incremental mean calculation are far off. Naively, one might say the incremental calculation is way worse (-1.881783551706252e+203 instead of -44.848083237000004), but the plain zero calculated by simple mean calculation would be for worse on a logarithmic scale.

The metric11 case is handled just fine with proper statistics tools (e.g. the Python3 statistics package). However, it would require processing the values in a certain order, which is hard to implement, given how the PromQL engine works. I would propose to not invest that effort until we see practical relevance of those cases. (Even the cases that we used to justify Kahan summation are already fairly contrived.)

As it turns out, if we combine Kahan summation and incremental mean calculation properly, it works quite well and we do not need to switch between simple mean calculation and incremental calculation based on overflow. This simplifies the code quite a bit. Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 · 2025-06-05T21:27:36Z

I have now also added a note as suggested by @bboreham .

beorn7 · 2025-06-05T21:29:49Z

Another insight (reflected in the note added): For avg_over_time, the promql engine gives us all the relevant values in a slice, so we could even play the tricks that statistics libraries do for a more accurate mean calculation (and also sum_over_time of course). But it would be more expensive, and it would break consistency with the behavior of the aggregation operators, where it would be much harder to control the order of processing samples.

bboreham

Great, thank you.

promql/promqltest/testdata/functions.test

@crush-on-anechka

These tests were initially created by @crush-on-anechka. I modified them slightly. Signed-off-by: beorn7 <beorn@grafana.com>

bboreham · 2025-06-09T09:56:50Z

Using go version go1.24.2 darwin/arm64 I get test failures from the merged repo at commit 8fc1750:

--- FAIL: TestEvaluations (1.48s)
    --- FAIL: TestEvaluations/testdata/aggregators.test (0.14s)
        --- FAIL: TestEvaluations/testdata/aggregators.test/line_596/avg(data{test="bigzero"}) (0.00s)
            test.go:1321: 
                        Error Trace:    /Users/bryan/src/github.com/prometheus/prometheus/promql/promqltest/test.go:1321
                        Error:          Received unexpected error:
                                        error in eval avg(data{test="bigzero"}) (line 596): expected 0 for {} but got 2.4948003869184e+291
                        Test:           TestEvaluations/testdata/aggregators.test/line_596/avg(data{test="bigzero"})
    --- FAIL: TestEvaluations/testdata/functions.test (0.47s)
        --- FAIL: TestEvaluations/testdata/functions.test/line_1013/avg_over_time(metric[2m]) (0.00s)
            test.go:1321: 
                        Error Trace:    /Users/bryan/src/github.com/prometheus/prometheus/promql/promqltest/test.go:1321
                        Error:          Received unexpected error:
                                        error in eval avg_over_time(metric[2m]) (line 1013): expected 0.5 for {} but got -1.3877787807814457e+83
                        Test:           TestEvaluations/testdata/functions.test/line_1013/avg_over_time(metric[2m])
        --- FAIL: TestEvaluations/testdata/functions.test/line_1085/avg_over_time(metric11[1m]) (0.00s)
            test.go:1321: 
                        Error Trace:    /Users/bryan/src/github.com/prometheus/prometheus/promql/promqltest/test.go:1321
                        Error:          Received unexpected error:
                                        error in eval avg_over_time(metric11[1m]) (line 1085): expected -1.881783551706252e+203 for {} but got 2.303079268822384e+202
                        Test:           TestEvaluations/testdata/functions.test/line_1085/avg_over_time(metric11[1m])
FAIL
FAIL    github.com/prometheus/prometheus/promql 5.077s

I don't get these at 6d10fce.

beorn7 · 2025-06-10T13:23:29Z

Thanks, I'll look into it.

beorn7 · 2025-06-10T15:57:35Z

Note for my investigation: Test pass with go1.24.1 linux/amd64. Will update to go1.24.2 and try again. (But I guess this could totally be arch dependent.)

beorn7 · 2025-06-10T16:32:49Z

Tests pass with go1.24.4 linux/amd64, too.

beorn7 · 2025-06-10T19:24:29Z

So the last failure is not very surprising because that's the test that goes wrong, and I just put in the wrong result to satisfy the test, but it's not surprising that different architectures get different wrong results.

The other two failures are tough, though. The 1st one is overflowing float64 if you just sum, so you have to use incremental mean calculation, and it is weird that the previous "wronger" version of the incremental mean calculation got the right result on all platforms. The 2nd one is relatively easy and should "just work". I'm very curious where things go wrong here, but I don't have access to a darwin/arm64 system to debug in detail.

I'll file a separate bug so that we can track debugging over there.

beorn7 · 2025-06-10T19:32:18Z

#16714 is the bug.

…omql2" This reverts commit 8fc1750, reversing changes made to 6d10fce. # Conflicts: # promql/functions.go # promql/promqltest/testdata/functions.test

The test in question actually worked fine even before #16569. The finding reported in the comment has turned out to be caused by something else. Signed-off-by: beorn7 <beorn@grafana.com>

This commit brings back direct mean calculation (for `avg` and `avg_over_time`) but isn't an outright revert of #16569. It keeps the improved incremental mean calculation and features generally a bit cleaner code than before. Also, this commit... - ...updates the lengthy comment explaining the whole situation and trade-offs. - ...divides the running sum and the Kahan compensation term separately (in direct mean calculation) to avoid the (unlikely) possibility that sum and Kahan compensation together ovorflow float64. - ...uncomments the tests that should now work again on darwin/arm64. - ...uncomments the test that should now reliably yield the (inaccurate) value 0 on all hardware platforms. Also, the test description has been updated accordingly. Signed-off-by: beorn7 <beorn@grafana.com>

This commit brings back direct mean calculation (for `avg` and `avg_over_time`) but isn't an outright revert of #16569. It keeps the improved incremental mean calculation and features generally a bit cleaner code than before. Also, this commit... - ...updates the lengthy comment explaining the whole situation and trade-offs. - ...divides the running sum and the Kahan compensation term separately (in direct mean calculation) to avoid the (unlikely) possibility that sum and Kahan compensation together ovorflow float64. - ...uncomments the tests that should now work again on darwin/arm64. - ...uncomments the test that should now reliably yield the (inaccurate) value 0 on all hardware platforms. Also, the test description has been updated accordingly. - ...adds avg_over_time tests for zero and one sample in the range. Signed-off-by: beorn7 <beorn@grafana.com>

The test in question actually worked fine even before #16569. The finding reported in the comment has turned out to be caused by something else. Signed-off-by: beorn7 <beorn@grafana.com>

This commit brings back direct mean calculation (for `avg` and `avg_over_time`) but isn't an outright revert of #16569. It keeps the improved incremental mean calculation and features generally a bit cleaner code than before. Also, this commit... - ...updates the lengthy comment explaining the whole situation and trade-offs. - ...divides the running sum and the Kahan compensation term separately (in direct mean calculation) to avoid the (unlikely) possibility that sum and Kahan compensation together ovorflow float64. - ...uncomments the tests that should now work again on darwin/arm64. - ...uncomments the test that should now reliably yield the (inaccurate) value 0 on all hardware platforms. Also, the test description has been updated accordingly. - ...adds avg_over_time tests for zero and one sample in the range. Signed-off-by: beorn7 <beorn@grafana.com>

The test in question actually worked fine even before #16569. The finding reported in the comment has turned out to be caused by something else. Signed-off-by: beorn7 <beorn@grafana.com>

This commit brings back direct mean calculation (for `avg` and `avg_over_time`) but isn't an outright revert of #16569. It keeps the improved incremental mean calculation and features generally a bit cleaner code than before. Also, this commit... - ...updates the lengthy comment explaining the whole situation and trade-offs. - ...divides the running sum and the Kahan compensation term separately (in direct mean calculation) to avoid the (unlikely) possibility that sum and Kahan compensation together ovorflow float64. - ...uncomments the tests that should now work again on darwin/arm64. - ...uncomments the test that should now reliably yield the (inaccurate) value 0 on all hardware platforms. Also, the test description has been updated accordingly. - ...adds avg_over_time tests for zero and one sample in the range. Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 requested a review from roidelapluie as a code owner May 7, 2025 14:01

beorn7 mentioned this pull request May 7, 2025

Float histograms: implement methods for Add/Sub operations using Kahan summation #15687

Open

beorn7 requested review from bboreham and machine424 May 12, 2025 08:39

machine424 self-assigned this May 19, 2025

bboreham approved these changes May 31, 2025

View reviewed changes

machine424 removed their assignment Jun 4, 2025

beorn7 force-pushed the beorn7/promql branch from d630eaf to a4d57fc Compare June 5, 2025 12:07

Base automatically changed from beorn7/promql to main June 5, 2025 14:45

beorn7 force-pushed the beorn7/promql2 branch from 63083e0 to 69ed1cb Compare June 5, 2025 14:48

beorn7 force-pushed the beorn7/promql2 branch from 0c339e6 to eb9d384 Compare June 5, 2025 21:27

bboreham approved these changes Jun 6, 2025

View reviewed changes

promql/promqltest/testdata/functions.test Outdated Show resolved Hide resolved

Add more avg_over_time test cases with extreme values

6443183

These tests were initially created by @crush-on-anechka. I modified them slightly. Signed-off-by: beorn7 <beorn@grafana.com>

beorn7 force-pushed the beorn7/promql2 branch from eb9d384 to 6443183 Compare June 6, 2025 21:50

beorn7 merged commit 8fc1750 into main Jun 6, 2025
45 checks passed

beorn7 deleted the beorn7/promql2 branch June 6, 2025 22:16

beorn7 mentioned this pull request Jun 10, 2025

promql: Numerical accuracy issues with mean calculation #16714

Closed

beorn7 mentioned this pull request Jun 10, 2025

promql: Deactivate three failing tests for the time being #16715

Merged

Konstantinov-Innokentii mentioned this pull request Jun 11, 2025

Sync prometheus at 19848bb grafana/mimir-prometheus#888

Merged

charleskorn mentioned this pull request Jun 24, 2025

Revert #16569 grafana/mimir-prometheus#914

Merged

beorn7 mentioned this pull request Jun 24, 2025

promql: Re-introduce direct mean calculation #16773

Merged

promql: Simplify avg aggregation and avg_over_time #16569

promql: Simplify avg aggregation and avg_over_time #16569

Uh oh!

Conversation

beorn7 commented May 7, 2025

Uh oh!

beorn7 commented May 7, 2025

Uh oh!

beorn7 commented May 7, 2025

Uh oh!

machine424 commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bboreham commented May 20, 2025

Uh oh!

machine424 commented May 21, 2025

Uh oh!

beorn7 commented May 29, 2025

Uh oh!

beorn7 commented May 29, 2025

Uh oh!

bboreham left a comment

Choose a reason for hiding this comment

Uh oh!

machine424 commented Jun 4, 2025

Uh oh!

beorn7 commented Jun 5, 2025

Uh oh!

beorn7 commented Jun 5, 2025

Uh oh!

beorn7 commented Jun 5, 2025

Uh oh!

bboreham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bboreham commented Jun 9, 2025

Uh oh!

beorn7 commented Jun 10, 2025

Uh oh!

beorn7 commented Jun 10, 2025

Uh oh!

beorn7 commented Jun 10, 2025

Uh oh!

beorn7 commented Jun 10, 2025

Uh oh!

beorn7 commented Jun 10, 2025

Uh oh!

Uh oh!

machine424 commented May 19, 2025 •

edited

Loading