Skip to content

Conversation

KofClubs
Copy link
Contributor

Fixes #16247

@KofClubs
Copy link
Contributor Author

In short, floating-point multiplication and floor rounding together caused this bug. So I fixed the conversion from numLit.Val float64 to rangeNl time.Duration.

Below is a simple example explaining everything.

package main

import (
	"fmt"
	"math"
	"time"
)

func OutdatedImpl(inSecondDuration float64) time.Duration {
	return time.Duration(inSecondDuration*1000) * time.Millisecond
}

func UpdatedImpl(inSecondDuration float64) time.Duration {
	return time.Duration(math.Round(inSecondDuration * float64(time.Second)))
}

func main() {
	inSecondDurations := []float64{1.000, 1.001, 1.002}
	var duration time.Duration
	for _, inSecondDuration := range inSecondDurations {
		duration = OutdatedImpl(inSecondDuration)
		fmt.Printf("[outdated] duration in s: %v, in ms: %v, in ns: %v\n", inSecondDuration, duration.Milliseconds(), duration.Nanoseconds())
		duration = UpdatedImpl(inSecondDuration)
		fmt.Printf("[updated] duration in s: %v, in ms: %v, in ns: %v\n\n", inSecondDuration, duration.Milliseconds(), duration.Nanoseconds())
	}
}
[outdated] duration in s: 1, in ms: 1000, in ns: 1000000000
[updated] duration in s: 1, in ms: 1000, in ns: 1000000000

[outdated] duration in s: 1.001, in ms: 1000, in ns: 1000000000
[updated] duration in s: 1.001, in ms: 1001, in ns: 1001000000

[outdated] duration in s: 1.002, in ms: 1002, in ns: 1002000000
[updated] duration in s: 1.002, in ms: 1002, in ns: 1002000000

@KofClubs KofClubs changed the title [WIP] promql: function selector sometimes misses a sample on dense samples promql: function selector sometimes misses a sample on dense samples Apr 24, 2025
Signed-off-by: Zhang Zhanpeng <zhangzhanpeng.zzp@alibaba-inc.com>
@KofClubs KofClubs force-pushed the range-vector-1001ms branch from bad33ac to 0658923 Compare April 24, 2025 08:29
@KofClubs KofClubs marked this pull request as ready for review April 24, 2025 08:29
@KofClubs KofClubs requested a review from roidelapluie as a code owner April 24, 2025 08:29
@machine424
Copy link
Member

Nice catch.
I don't have the full context, but from skimming the parser, I’m not
sure why these values are initially parsed as seconds

$$ = &NumberLiteral{
Val: dur.Seconds(),
PosRange: $1.PositionRange(),
Duration: true,
}
}
;
number : NUMBER
{
$$ = yylex.(*parser).number($1.Val)
}
| DURATION
{
var err error
var dur time.Duration
dur, err = parseDuration($1.Val)
if err != nil {
yylex.(*parser).addParseErr($1.PositionRange(), err)
}
$$ = dur.Seconds()

and only converted to ms afterward. Wouldn’t it make more sense to
parse them as ms from the start, given we want to have ms precision?

@KofClubs
Copy link
Contributor Author

machine424

+1, the loss of precision is introduced by dur.Seconds().

It seems like I should ask the author of this code block about the considerations for converting to seconds instead of milliseconds?

@aknuds1 aknuds1 requested a review from Copilot April 25, 2025 07:28
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an issue where the function selector may miss a sample on dense samples. The key changes include:

  • Adding new test cases in parse_test.go to cover various millisecond input cases.
  • Modifying duration computations in generated_parser.y.go by replacing multiplication logic with math.Round.

Reviewed Changes

Copilot reviewed 2 out of 4 changed files in this pull request and generated no comments.

File Description
promql/parser/parse_test.go Adds test cases for MatrixSelector with varied millisecond inputs.
promql/parser/generated_parser.y.go Updates duration conversion logic to use math.Round.
Files not reviewed (2)
  • promql/parser/generated_parser.y: Language not supported
  • promql/promqltest/testdata/functions.test: Language not supported
Comments suppressed due to low confidence (4)

promql/parser/generated_parser.y.go:1375

  • Using math.Round with a time.Second multiplier may round the offset to a whole second, potentially losing millisecond precision. Verify that this behavior correctly handles dense sample scenarios as intended.
yylex.(*parser).addOffset(yyDollar[1].node, time.Duration(math.Round(numLit.Val*float64(time.Second))))

promql/parser/generated_parser.y.go:1426

  • Replacing the previous millisecond conversion with math.Round may result in rounding duration values to the nearest whole second instead of maintaining millisecond accuracy. Ensure that this change gives the intended behavior under all expected inputs.
rangeNl = time.Duration(math.Round(numLit.Val * float64(time.Second)))

promql/parser/generated_parser.y.go:1446

  • The introduction of math.Round for step duration may round to whole seconds, which could affect subquery behavior if sub-second precision is needed. Confirm that the loss of millisecond granularity is acceptable.
stepNl = time.Duration(math.Round(numLit.Val * float64(time.Second)))

promql/parser/generated_parser.y.go:1463

  • Rounding the range using math.Round shifts the precision from milliseconds to seconds; ensure this modification aligns with the intended fix for missing samples on dense metrics.
rangeNl = time.Duration(math.Round(numLit.Val * float64(time.Second)))

@machine424
Copy link
Member

I think the changes were made in #9138
it seems that a regression was introduced there. The tests you added in promql/parser/parse_test.go pass on release-2.53 (I didn't try the other tests)

@roidelapluie
Copy link
Member

It is converted to seconds because when using a plain number in Prometheus, it is considered to be seconds.

Copy link
Member

@roidelapluie roidelapluie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move that to a parseNumLit function ?

@machine424
Copy link
Member

It is converted to seconds because when using a plain number in Prometheus, it is considered to be seconds.

But 1001ms isn't a plain number, as I mentionedin #16478 (comment), this wasn't the case before #9138

I'm not a big fan of "let's parse all numbers the same way, but let's have some special logic for durations", it even goes against #9138.
I think parsing should be aware of this and take care of it.

@roidelapluie
Copy link
Member

#9138 makes them all similar. What would you propose as a fix?

I believe the fix could also be to introduce a .Duration() time.Duration to NumberLiteral and use it when we need it.

@machine424
Copy link
Member

I haven't looked closely at the code, so I can't provide a complete/working solution. That said, I do think it's a pity that we lose such info/precision during parsing. I was wondering in #16478 (comment) we could just return durations in ms instead of s in some cases. Or even for all cases given we want ms precision, and then make sure the ms->s is done correctly when needed (I don't see why we'd need that, if we only work with ms)

@roidelapluie
Copy link
Member

For now this is a correct fix for the issue. I think we could change to Milliseconds but at the expense of potentially breaking downstream users, do I am happy with this PR as it.

@roidelapluie roidelapluie merged commit 59874fd into prometheus:main May 8, 2025
27 checks passed
@roidelapluie
Copy link
Member

Thanks!

@bboreham bboreham changed the title promql: function selector sometimes misses a sample on dense samples [BUGFIX] PromQL: fix incorrect rounding of [1001ms] to [1s] and similar (selector sometimes misses a sample on dense samples) Jun 24, 2025
@bboreham
Copy link
Member

Hi, I was rewording descriptions for the changelog to make them more useful for end-users, so I expanded the title of this PR.

I did not understand what "function selector" was supposed to mean, so dropped the word "function".

Something like foo[1m] is named a "range vector selector" in the docs.

@KofClubs
Copy link
Contributor Author

Hi, I was rewording descriptions for the changelog to make them more useful for end-users, so I expanded the title of this PR.

I did not understand what "function selector" was supposed to mean, so dropped the word "function".

Something like foo[1m] is named a "range vector selector" in the docs.

@bboreham Thanks! Your revision is spot on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistency in millisecond selectors in query explanations
4 participants