Skip to content

Conversation

krajorama
Copy link
Member

@krajorama krajorama commented Jun 13, 2025

Fixes: #16578

See the issue for detailed explanation.
When a histogram had only NaN observations and no normal observations,
we returned 0 from the quantile, which is completely wrong. If there were
normal observations but we went over them, we returned the upper bound of
the existing buckets, however that contradicts expectations on
histogram_fraction. Now we return NaN if the quantile is calculated to be
over all normal observations, falling into NaNs (in a virtual +Inf bucket).

We also return info level annotations if we see any NaN observations.
The annotation calls out if we returned NaN or even if we took the
virtual +Inf bucket into account.

Fixes: #16580

According to the specification we should not take NaN observations
into account when calculating the fraction. This commit fixes that
and adds an info level annotation to let the user know about this.

@krajorama krajorama force-pushed the krajo/nh-quantile-nan branch 4 times, most recently from e276fd7 to abef671 Compare June 13, 2025 07:48
krajorama added a commit to prometheus/docs that referenced this pull request Jun 13, 2025
Fix TODO.
Related to prometheus/prometheus#16578
Related to prometheus/prometheus#16724

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama force-pushed the krajo/nh-quantile-nan branch from abef671 to 0553341 Compare June 13, 2025 09:31
@krajorama krajorama changed the title fix(promql): histogram_quantile NaN observed in native histogram fix(promql): histogram_quantile and histogram_fraction NaN observed in native histogram Jun 13, 2025
@krajorama krajorama force-pushed the krajo/nh-quantile-nan branch 2 times, most recently from 4641db0 to 3ff6dce Compare June 13, 2025 09:35
krajorama added a commit to prometheus/docs that referenced this pull request Jun 13, 2025
Fix TODO.
Related to prometheus/prometheus#16578
Related to prometheus/prometheus#16724
Related to prometheus/prometheus#16580

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
krajorama added a commit to prometheus/docs that referenced this pull request Jun 13, 2025
Fix TODOs. And align with new implementation.
Related to prometheus/prometheus#16578
Related to prometheus/prometheus#16724
Related to prometheus/prometheus#16580

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Fixes: #16578

See the issue for detailed explanation.
When a histogram had only NaN observations and no normal observations,
we returned 0 from the quantile, which is completely wrong. If there were
normal observations but we went over them, we returned the upper bound of
the existing buckets, however that contradicts expectations on
histogram_fraction. Now we return NaN if the quantile is calculated to be
over all normal observations, falling into NaNs (in a virtual +Inf bucket).

We also return info level annotations if we see any NaN observations.
The annotation calls out if we returned NaN or even if we took the
virtual +Inf bucket into account.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Fixes: #16580

According to the specification we should not take NaN observations
into account when calculating the fraction. This commit fixes that
and adds an info level annotation to let the user know about this.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama force-pushed the krajo/nh-quantile-nan branch from 3ff6dce to 61bb15a Compare June 13, 2025 09:41
krajorama added a commit to prometheus/docs that referenced this pull request Jun 13, 2025
Fix TODOs. And align with new implementation.
Related to prometheus/prometheus#16578
Related to prometheus/prometheus#16724
Related to prometheus/prometheus#16580

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama marked this pull request as ready for review June 13, 2025 09:44
@krajorama krajorama requested a review from roidelapluie as a code owner June 13, 2025 09:44
@krajorama krajorama requested review from beorn7 June 13, 2025 10:39
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, just minor things.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Copy link
Member

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's keep the old behavior for histograms where the sum is not NaN and only do the NaN return for histograms where the sum is NaN.

The histogram could just be in breach of the spec, of course, but then it is still a defined behavior to return bucket.Upper as we have don so far anyway.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama merged commit 5b7ff92 into main Jun 25, 2025
45 checks passed
@krajorama krajorama deleted the krajo/nh-quantile-nan branch June 25, 2025 11:37
krajorama added a commit to prometheus/docs that referenced this pull request Jun 25, 2025
…2676)

* fix(nh-spec): update histogram_quantile and histogram_fraction docs

Fix TODOs. And align with new implementation.
Related to prometheus/prometheus#16578
Related to prometheus/prometheus#16724
Related to prometheus/prometheus#16580

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Native histograms: address TODO about histogram_fraction in spec Native histograms: address TODO about histogram_quantile on NaN
3 participants