-
Notifications
You must be signed in to change notification settings - Fork 9.7k
fix(promql): histogram_count and histogram_sum inconsistent #16682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
35765d1
to
696fbbb
Compare
Note the test does not fail if I disable the count/sum optimization:
|
Python script used to convert user's data acquired by range vector selector on query API into test data:
|
I think the problem is in the counter reset detection. The code that loads the samples matrixIterSlice uses the typed Buffer iterator, which will preload the integer histogram samples, however the last sample is always(!) loaded as a float histogram sample and the optimized iterator fails to detect counter resets in that case. Proof: I can hack |
Also the bug explains why I couldn't reproduce with the unit test: the unit test uses float histograms for everything, this transition from integer to float does not happen. |
696fbbb
to
867e4a8
Compare
Ref: #16681 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
4d29962
to
acb6f9d
Compare
The problem is in the counter reset detection. The code that loads the samples is matrixIterSlice which uses the typed Buffer iterator, which will preload the integer histogram samples, however the last sample is always(!) loaded as a float histogram sample in matrixIterSlice and the optimized iterator fails to detect counter resets in that case. Also the iterator does not reset lastH, lastFH properly. Ref: #16681 Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
acb6f9d
to
df94ece
Compare
cc @fpetkovski |
cc @charleskorn |
…tect Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Epic bug hunt.
Thank you very much.
Thank you @krajorama! |
The problem is in the counter reset detection. The code that loads the
samples is matrixIterSlice which uses the typed Buffer iterator, which
will preload the integer histogram samples, however the last sample is
always(!) loaded as a float histogram sample in matrixIterSlice and the
optimized iterator fails to detect counter resets in that case.
This means that the error happens fairly rarely:
Also the iterator does not reset lastH, lastFH properly.
In light of this bug, I wonder if we should rewrite the iterator to always return float histograms. Or maybe add a version that does that. On the other hand maybe it's better to rewrite the engine to use
if
for distinguishing between float samples and other samples and use generics for the "other" samples.Fixes: #16681