-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Float histograms: implement methods for Add/Sub operations using Kahan summation #15687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Float histograms: implement methods for Add/Sub operations using Kahan summation #15687
Conversation
@KofClubs is also working on this (or maybe the work has stalled?). Just FYI. |
Yes, I know. (See the TODO here). I'm not sure |
7790b58
to
7ada325
Compare
I know, I guess the work has stalled as there hasn’t been activity for a long time. If that's not the case, then I'm ok to step aside and not interfere. Otherwise I'm ready to keep on |
786179b
to
8cfa0d5
Compare
Thank you very much, I'll have a detailed look ASAP (which is probably not very soon, as I'm about to disappear into my holiday break). If anyone wants to review this in the meantime, be my guest. I haven't heard anything from @KofClubs , so let's hope we don't have redundant work happening here. |
Back from my holiday break, but swamped by the backlog. I hope I'll get to this next week. |
I just received an email about @crush-on-anechka 's good work. I'm sorry that I put this work on hold due to the long office hours every day:( I shall unassign the corresponding issue. |
Sorry, I caught a cold and got less done than planned. I need to ask for more patience. (And of course, if somebody else beats me to the review, be my guest.) |
That's because I still haven't made up my mind how to treat counter resets for subtraction. I'll add that once I have an idea.
Let's not do it for now. I don't expect the counts in the merged buckets to be hugely different, and usually we'll only add a few buckets anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much. Generally looks good.
Special thanks for all the careful thought you put into this.
It's a bit sad that we need to duplicate so much logic, but I guess it cannot be prevented most of the time. (For the few occasions where we can, I have left a comment.)
It would be good to already wire the new methods into PromQL (for addition, subtraction, and the sum aggregation etc, see more in later comment). This will give us a lot more test coverage (via the PromQL tests), and we could even easily try out a few things in the PromQL tests to prove that the Kahan summation actually helps.
A few more thoughts: We currently don't do Kahan summation for the float binary Kahan summation is currently only used in certain aggregations, the corresponding "over time" functions and some specific functions. Many of those only act on float samples. The only aggregations and functions we will need Kahan summation for histograms for will be:
As far as I can see, we do not need |
@beorn7 thank you for the comments! I will be able to proceed in a week, if that's fine |
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
The idea is to only switch to incremental calculation once we would everflow float64. An imprecise average is still better than the Inf or NaN we get when overflowing. Check how it is done for averaging the simple float samples. |
After a series of experiments, I found that both incremental and non-incremental average calculation involving large-magnitude values do not behave as expected for simple floats also. Summing numbers of drastically different magnitudes leads to precision loss. Kahan helps reduce errors, but it still relies on basic For example we try to add While reviewing the current summing algorithm, I noticed some potential issues and experimented with an updated version, nevertheless both of them don't work as expected in certain cases:
As you can see, non-incremental calculation also fails when the input values span three different orders of magnitude. If we only had values around ~100 orders of magnitude and others below ~14 orders of magnitude, the calculation would work: the main sum would handle the large numbers, while the compensation would account for the small ones. However, introducing values in the middle range (around ~50 orders of magnitude) corrupts the calculation. TL;DR: |
Thanks for your thoughts. There aren't really any assumptions about the input values. Users can do whatever they want. However, there is no expectation that we can solve all floating point precision issues. Even Kahan summation can take us only so far. I'm not an expert in floating point arithmetics, but your line of argument that providing values from many different orders of magnitude will eventually break even Kahan summation, sounds plausible. I usually try to imagine Kahan summation as "getting something like float128 precision even on a float64 FPU" – at somewhat higher cost, which doesn't really make a dent in the PromQL case because fetching all those numbers from the TSDB is much more expensive that the additional summation we have to do for Kahan. But even float128 will eventually run into precision issues. About your examples:
Where did you get your numbers from? If you add other values with an order of magnitude in between, like the following, I can also see the problem:
But as said, I think that is expected. However, I expect the outcome to be close to zero here, not any gigantic number. Again, I don't know how you got the numbers you quoted above. |
Apologies for the lack of clarity in my previous message. I’ve been experimenting with the
To force the incremental calculation path, I explicitly set When I referred to the “updated version” of the algorithm, I meant this one:
This is in contrast to the current version:
|
(Note for the reader to not lose track: We are currently discussion the average calculation for simple floats, not for histograms. We just stumbled upon problems with floats while trying to figure out how to do histograms.) The current non-incremental version works:
The problem is if the test is performed Even the incremental versions should give "correct" results within floating point accuracy, of course. I'm not sure if our current "hybrid incremental/Kahav avg calculation" is ideal (or maybe even correct). I found a bit of discussion on Stackoverflow, but it fell short to provide the actual algorithm that combines Kahan and incremental avg calculation. I think your "updated version" is not quite correct. I have a slightly different idea. Maybe with that idea in place, we can get rid of the whole "switching to incremental calculation if needed" thing. I'm working on a PR to present to you for further discussion. |
I have created #16569 for the (hopefully) correct combination of Kahan and incremental mean calulation. |
#16569 looks tidy indeed! My thoughts:
This yields I’ve written additional tests for the approach in #16569 and grouped them by behavior (see comments).
|
Thank you. I still have to process your latest response with the depth it deserves. However, I'm still caught between several events at work and in my usual life, but I will hopefully get to it next week. |
Thanks for the additional test cases. I have tried them out, and for all that fail with #16569, they also fail before #16569, just with more or less differing values. Thus, I'm still not convinced that the incremental approach with "correct" Kahan summation as in #16569 is really performing worse than direct Kahan summation with the one division at the end (what I call "simple mean calculation"). I will incorporate your test cases in #16569 and discuss them over there. (Hopefully I'll get to it tomorrow.) We can continue the detailed discussion over there. About your other comment:
Not quite sure what you are referring to here. We have two dimensions here:
What do you mean by "direct summation"? There are plenty of test cases already in the test data that would fail without Kahan summation. |
One more question about |
For |
You're right — I suspect my earlier results were off due to inaccuracies in the averaging method I used. Using Python's statistics is a good idea, I will stick with it going forward. |
Regarding my comment on "Kahan summation outperforming direct summation" — what I actually meant was "Kahan summation vs. regular summation." Since I was working on the I then tried to come up with scenarios where Kahan compensation would make a noticeable difference, but I couldn't find any clear-cut cases. This was just a side experiment driven by curiosity, and if there are already tests confirming the effectiveness of the Kahan implementation, I'm happy to disregard my earlier comment. |
You are right. I must have left something out of Oven in #16714, we also got some more evidence that the direct mean calculation is doing better than the incremental mean calculation. My plan is now to combine what we had before #16569 with the improved incremental mean calculation. |
See #16773 . Assuming that PR is now really what we want, we have the blueprint for how to do Kahan summation in native histograms. (Sadly, with the adaptive approach still being better than always just doing the incremental mean calculation, things are not as simple as I was hoping for.) |
Sounds great! I’ll admit I got a bit lost while tweaking the Kahan algorithm, so I mostly ended up taking a backseat and observing. I’ll jump back into working on the histograms now |
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
0f910dc
to
0b26663
Compare
I've implemented #16773 avg calculation behavior for histograms. Here are a few notes from my side:
|
Sounds great. I'll have a look ASAP. (My review queue is quite manageable these days, but I will have limited availability over the next two weeks.) |
And now I'm on vacation… sorry for all the delays. I'll be back in action for good starting August 12. |
import "math" | ||
|
||
// Inc performs addition of two floating-point numbers using the Kahan summation algorithm. | ||
func Inc(inc, sum, c float64) (newSum, newC float64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In lieu if a detailed review just a quick but important note: Have a look at #16895 . It seems plausible to me that inlining and subsequent optimization by the compiler could optimize away the whole Kahan trick, so we should do that here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen that PR. Sure, I'll do this
That's totally fine. Do you think it's a good time to rebase onto the latest main locally to resolve conflicts? |
This PR addresses issue #14105 and PR #14325 maintained by @beorn7. While the issue has an assignee, I decided to contribute out of personal interest as there hasn’t been recent activity.
Overview:
New methods, kahanAdd and kahanSub, to the FloatHistogram replicate the logic of the existing Add and Sub methods but incorporate the Kahan summation algorithm. The methods require an initialized compensation histogram as input and return both the modified receiving histogram and the updated compensation histogram. The resulting histograms must then be combined using the original Add/Sub methods. Usage examples can be seen in the modified TestFloatHistogramAdd and TestFloatHistogramSub test cases.
Details:
Food for thought: