[bugfix] Correct instant query calculation #5252

ruslan-mikhailov · 2025-06-10T13:03:49Z

What this PR does:

Instant query should have only one interval, while it has 2 or more because of alignment.
For example, if we have start value
from req:

1749037020 - Wed Jun  4 13:37:00 CEST 2025

it will be aligned:

1749034800 - Wed Jun  4 13:00:00 CEST 2025

End param from req:

1749040620 - Wed Jun  4 14:37:00 CEST 2025

and aligned:

1749042000 - Wed Jun  4 15:00:00 CEST 2025

In total, it gives 2 intervals: 13 - 14, 14-15 with step=1h. Then it gets only the first value here

More details here: #5221 (comment)

The fix is not ideal and uses the same approach of identifying instant queries as before:

func IsInstant(req tempopb.QueryRangeRequest) bool {
	return req.End-req.Start == req.Step
}

Possibly, the better solution would be to pass "instant bool" into QueryRangeRequest, but that requires changes in a bunch of function signatures. Let me know what you think

Which issue(s) this PR fixes:
Fixes #5164 and #5221
Please note that for #5164 there is at least one more bug: querying in range that forces to query both generator and backend gives inaccurate results. It will be fixed in a separate PR.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

ruslan-mikhailov · 2025-06-10T15:09:10Z

pkg/traceql/engine_metrics.go

 	if step == 0 {
 		return 0
 	}
+	if isInstant(start, end, step) {
+		return start


I made a copy-paste mistake here, it should return end. Now I wonder, how it works well with that

It worked because with end == start, the number of intervals is 1 due to https://github.com/ruslan-mikhailov/tempo/blob/main/pkg/traceql/engine_metrics.go#L59
If return end as it should be, then the number of samples will be 2. See https://github.com/ruslan-mikhailov/tempo/blob/main/tempodb/tempodb_metrics_test.go#L62

To fix properly, I force instant queries to always use only one interval

mdisibio · 2025-06-11T12:47:19Z

pkg/traceql/engine_metrics.go

 	return start - start%step
 }

 // End time is rounded up to next step
-func alignEnd(end, step uint64) uint64 {
+func alignEnd(start, end, step uint64) uint64 {


We chatted about this offline - this is the root cause of too many intervals both for instant and query_range. Because we round start down, and end up, it always creates 1 more step. This is modeling behavior of prometheus where start and end timestamps are inclusive.

At some point we want to fix this for query_range too. I.e. 1h with 1m steps should return 60 data points not 61. Is it worth fixing in this PR, or are you thinking to address this in a separate PR? I think we could revisit and simplify some of these changes after doing that.

I feel like we should decide on the a behaviour that makes sense of us, even if it's slightly different from prometheus.

we don't promise to be prometheus format compliant, we are prometheus inspired and similar so it's okay to diverge where it makes sense for us to diverge.

My suggestion is to fix it with this workaround so far to not be blocked by discussions as it fixes at least 3 issues. Then, when decision is made, we can refactor the code.

pkg/traceql/engine_metrics_test.go

ruslan-mikhailov · 2025-06-16T12:36:15Z

+ additional fix

mdisibio

I'm ok with the changes as-is because the fixes are good, but I want to mention another possible direction:

The original vision I had for instant queries was to use a different range aggregator instead of StepAggregator here: https://github.com/grafana/tempo/blob/main/pkg/traceql/ast_metrics.go#L156 StepAggregator does the work to sort values into intervals in its Observe function. If we made a new InstantAggregator it would always have and return 1 interval, and it wouldn't need to do time calculations at all. I think this is the ideal fix, because it is simpler but also more efficient. Similar to how we swap between Grouping or Ungrouped aggregators based on presence of any by() clause.

Not sure why this wasn't done originally when introducing instant queries. I think I just didn't realize the number of edge cases being introduced without it.

If you are interested in exploring this direction a bit, I think it would be worthwhile. It should be less churn overall.

ruslan-mikhailov · 2025-06-17T09:29:55Z

+ linter fixes

when query both generator and backend

ruslan-mikhailov · 2025-06-17T09:31:50Z

+ rebase from latest main in order to resolve conflicts

electron0zero

lgtm

* [bugfix] TraceQL Metrics: correct interval number calculation * [bugfix] Force to return one interval for instant queries * linter fix: apply De Morgan's law * e2e: refactoring * e2e: refactoring * e2e: function to call instant query * e2e: instant query * e2e: remove redundant param * e2e: add traces with high cardinality attributes * e2e: test instant bottomk/topk * Add tests for topk and bottomk Instant queries * Changelog * test refactoring * Basic test to check corner cases for instant query * [bugfix] Correct instant query calculation when query both generator and backend --------- Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com>

* Bump github.com/go-viper/mapstructure/v2 from 2.2.1 to 2.3.0 (#5333) Bumps [github.com/go-viper/mapstructure/v2](https://github.com/go-viper/mapstructure) from 2.2.1 to 2.3.0. - [Release notes](https://github.com/go-viper/mapstructure/releases) - [Changelog](https://github.com/go-viper/mapstructure/blob/main/CHANGELOG.md) - [Commits](go-viper/mapstructure@v2.2.1...v2.3.0) --- updated-dependencies: - dependency-name: github.com/go-viper/mapstructure/v2 dependency-version: 2.3.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update go tools to 1.24.4 (#5322) * cherry pick * Update changelog * cherry pick * Bugfix: Correctly assign backend shard numbers depending on ingester shards (#5438) * Correctly assign backend shard numbers depending on ingester shards Signed-off-by: Joe Elliott <number101010@gmail.com> * changelog Signed-off-by: Joe Elliott <number101010@gmail.com> * test cleanup Signed-off-by: Joe Elliott <number101010@gmail.com> * a single space raised up as an offering to the lint gods Signed-off-by: Joe Elliott <number101010@gmail.com> --------- Signed-off-by: Joe Elliott <number101010@gmail.com> * cherry picking * Propagate tracing context in distributor for HTTP requests (#5312) * Propagate OTel context in distributor * Changelog * chlog v2 * Only propagate sampled traces * Refactor # Conflicts: # CHANGELOG.md * [bugfix] Correct instant query calculation (#5252) * [bugfix] TraceQL Metrics: correct interval number calculation * [bugfix] Force to return one interval for instant queries * linter fix: apply De Morgan's law * e2e: refactoring * e2e: refactoring * e2e: function to call instant query * e2e: instant query * e2e: remove redundant param * e2e: add traces with high cardinality attributes * e2e: test instant bottomk/topk * Add tests for topk and bottomk Instant queries * Changelog * test refactoring * Basic test to check corner cases for instant query * [bugfix] Correct instant query calculation when query both generator and backend --------- Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com> * Add nil check to `partitionAssignmentVar` (#5198) * Nil check partitionAssignmentVar * Changelog # Conflicts: # CHANGELOG.md * True up changelog * True up linter * Apply suggestions from code review Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * Update CHANGELOG.md Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * Update CHANGELOG.md Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com> * add release notes --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Joe Elliott <number101010@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: A. Stoewer <adrian@stoewer.me> Co-authored-by: J Pham <94262131+ie-pham@users.noreply.github.com> Co-authored-by: Joe Elliott <number101010@gmail.com> Co-authored-by: Ruslan Mikhailov <195758209+ruslan-mikhailov@users.noreply.github.com> Co-authored-by: Mario <mariorvinas@gmail.com> Co-authored-by: Suraj Nath <9503187+electron0zero@users.noreply.github.com> Co-authored-by: Kim Nylander <104772500+knylander-grafana@users.noreply.github.com>

ruslan-mikhailov force-pushed the bugfix/instant-query branch 5 times, most recently from eca0ced to 44d6dd7 Compare June 10, 2025 13:54

ruslan-mikhailov marked this pull request as ready for review June 10, 2025 15:05

ruslan-mikhailov requested review from joe-elliott, mdisibio, mapno, yvrhdn, zalegrala, electron0zero, ie-pham, stoewer, javiermolinar and carles-grafana as code owners June 10, 2025 15:05

ruslan-mikhailov commented Jun 10, 2025

View reviewed changes

ruslan-mikhailov marked this pull request as draft June 10, 2025 15:17

ruslan-mikhailov force-pushed the bugfix/instant-query branch 2 times, most recently from 8aec5ae to b0b6c40 Compare June 10, 2025 15:50

electron0zero mentioned this pull request Jun 10, 2025

Add tests for topk and bottomk Instant queries #5261

Closed

3 tasks

ruslan-mikhailov force-pushed the bugfix/instant-query branch 3 times, most recently from cb72e98 to 2778368 Compare June 11, 2025 09:12

ruslan-mikhailov marked this pull request as ready for review June 11, 2025 09:46

mdisibio reviewed Jun 11, 2025

View reviewed changes

mdisibio reviewed Jun 16, 2025

View reviewed changes

ruslan-mikhailov force-pushed the bugfix/instant-query branch from 8d53cc0 to a4d7f9b Compare June 17, 2025 09:30

ruslan-mikhailov and others added 15 commits June 17, 2025 11:30

[bugfix] TraceQL Metrics: correct interval number calculation

f43c5b9

[bugfix] Force to return one interval for instant queries

8113f2d

linter fix: apply De Morgan's law

489969b

e2e: refactoring

920a26a

e2e: refactoring

27343a5

e2e: function to call instant query

409ef69

e2e: instant query

e28f497

e2e: remove redundant param

24b0814

e2e: add traces with high cardinality attributes

41ed767

e2e: test instant bottomk/topk

316b47d

Add tests for topk and bottomk Instant queries

994666a

Changelog

8249264

test refactoring

be7b4b5

Basic test to check corner cases for instant query

d9c586d

[bugfix] Correct instant query calculation

bcf4706

when query both generator and backend

ruslan-mikhailov force-pushed the bugfix/instant-query branch from a4d7f9b to bcf4706 Compare June 17, 2025 09:31

mdisibio approved these changes Jun 17, 2025

View reviewed changes

electron0zero approved these changes Jun 17, 2025

View reviewed changes

ruslan-mikhailov merged commit 7776e70 into grafana:main Jun 17, 2025
21 checks passed

ruslan-mikhailov deleted the bugfix/instant-query branch June 17, 2025 12:09

ruslan-mikhailov mentioned this pull request Jun 23, 2025

topk/bottomk returns more then k results in TraceQL metrics instant query #5221

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix] Correct instant query calculation #5252

[bugfix] Correct instant query calculation #5252

Uh oh!

ruslan-mikhailov commented Jun 10, 2025 •

edited

Loading

Uh oh!

ruslan-mikhailov Jun 10, 2025

Uh oh!

ruslan-mikhailov Jun 10, 2025 •

edited

Loading

Uh oh!

ruslan-mikhailov Jun 11, 2025

Uh oh!

mdisibio Jun 11, 2025

Uh oh!

electron0zero Jun 11, 2025

Uh oh!

ruslan-mikhailov Jun 16, 2025

Uh oh!

Uh oh!

ruslan-mikhailov commented Jun 16, 2025

Uh oh!

mdisibio left a comment

Uh oh!

ruslan-mikhailov commented Jun 17, 2025

Uh oh!

ruslan-mikhailov commented Jun 17, 2025

Uh oh!

electron0zero left a comment

Uh oh!

Uh oh!

Uh oh!

[bugfix] Correct instant query calculation #5252

[bugfix] Correct instant query calculation #5252

Uh oh!

Conversation

ruslan-mikhailov commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ruslan-mikhailov Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

mdisibio Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

electron0zero Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ruslan-mikhailov commented Jun 16, 2025

Uh oh!

mdisibio left a comment

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov commented Jun 17, 2025

Uh oh!

ruslan-mikhailov commented Jun 17, 2025

Uh oh!

electron0zero left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ruslan-mikhailov commented Jun 10, 2025 •

edited

Loading

ruslan-mikhailov Jun 10, 2025 •

edited

Loading