Exemplars UX improvements #5158

ruslan-mikhailov · 2025-05-23T12:58:46Z

What this PR does:

Distribute exemplars over time.
a. Change the approach to calculating requested exemplars from frontend to querier. Instead of equal share, calculate it based on block time range. The biggest block will receive more exemplars than a small block
b. Change sampling algorithm by fixing number of sampling buckets

Before the change:

After the change:

Results:
Before:

After:

Hard limit returned exemplars from query-frontend during final aggregation.

Which issue(s) this PR fixes:
Fixes #4856
Fixes #4632

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

mdisibio · 2025-05-23T14:55:37Z

modules/frontend/metrics_query_range_sharder.go

@@ -228,8 +263,8 @@ func (s *queryRangeSharder) buildBackendRequests(ctx context.Context, tenantID s
 	queryHash := hashForQueryRangeRequest(&searchReq)
 	colsToJSON := api.NewDedicatedColumnsToJSON()

-	exemplarsPerBlock := s.exemplarsPerShard(uint32(len(metas)), searchReq.Exemplars)
-	for _, m := range metas {
+	exemplarsPerBlock := s.exemplarsPerShard(metas, searchReq.Exemplars)


Should we also scale the number of exemplars for generatorRequest based on the time range? I think it sends over the whole limit, which was also a reason exemplars seemed to clump in recent data.

Good point! With bucket fix, it still should distribute fairly, but by cutting the exemplars parameter, we can reduce the load to generators.

I pushed exemplar splitting between generator and querier in a separate commit. Could you please take a look?

knylander-grafana · 2025-05-23T23:03:07Z

@ruslan-mikhailov Do we need to update the docs so people understand the changes you've made in this PR? Maybe this doc? https://grafana.com/docs/tempo/latest/traceql/metrics-queries/#exemplars

Also, have you seen this PR? #5129

integration/e2e/api/query_range_test.go

integration/e2e/limits_test.go

mapno

LGTM

Determine the number of requested exemplars from a block based on its time range, not equally

UX improvement

This will reduce load on generators

ruslan-mikhailov · 2025-06-02T08:38:18Z

+ rebase from latest main

ruslan-mikhailov · 2025-06-02T08:39:19Z

+ linter fixes

* TraceQL Metrics: more fair exemplars distribution Determine the number of requested exemplars from a block based on its time range, not equally * TraceQL Metrics: distribute exemplars over time UX improvement * TraceQL Metrics: Parametrise exemplars limit * TraceQL Metrics: pass exemplars limit to combiners * TraceQL Metrics: hard limit exemplars * TraceQL Metrics: restore default behaviour * TraceQL Metrics: e2e test for number of returned exemplars * TraceQL Metrics: state bug in quantile_over_time * Changelog * Stabilise flaky test * nolint for integer overflow * minor linter fixes * TraceQL Metrics: share exemplars between generator and backend This will reduce load on generators * Add comment with link to bug * minor linter fixes

ruslan-mikhailov force-pushed the exemplars-improvements branch 7 times, most recently from 74e9718 to 147cca9 Compare May 23, 2025 14:33

ruslan-mikhailov marked this pull request as ready for review May 23, 2025 14:48

ruslan-mikhailov requested review from joe-elliott, mdisibio, mapno, yvrhdn, zalegrala, electron0zero, ie-pham, stoewer, javiermolinar and carles-grafana as code owners May 23, 2025 14:48

mdisibio reviewed May 23, 2025

View reviewed changes

ruslan-mikhailov mentioned this pull request May 26, 2025

[Epic] Query correctness #5165

Open

ruslan-mikhailov commented May 26, 2025

View reviewed changes

integration/e2e/api/query_range_test.go Outdated Show resolved Hide resolved

ruslan-mikhailov mentioned this pull request May 26, 2025

Fix traceql exemplar distribution #5129

Closed

3 tasks

mapno reviewed May 27, 2025

View reviewed changes

integration/e2e/limits_test.go Show resolved Hide resolved

ruslan-mikhailov requested review from mdisibio and mapno May 28, 2025 10:39

mapno approved these changes May 29, 2025

View reviewed changes

ruslan-mikhailov added 3 commits June 2, 2025 10:33

TraceQL Metrics: more fair exemplars distribution

0642447

Determine the number of requested exemplars from a block based on its time range, not equally

TraceQL Metrics: distribute exemplars over time

52ce74d

UX improvement

TraceQL Metrics: Parametrise exemplars limit

1c675cc

ruslan-mikhailov added 10 commits June 2, 2025 10:33

TraceQL Metrics: pass exemplars limit to combiners

a9b90a5

TraceQL Metrics: hard limit exemplars

2bbd04f

TraceQL Metrics: restore default behaviour

f579a47

TraceQL Metrics: e2e test for number of returned exemplars

8f50694

TraceQL Metrics: state bug in quantile_over_time

00a4b46

Changelog

f574942

Stabilise flaky test

1a25dc4

nolint for integer overflow

cf26950

minor linter fixes

068ffb6

TraceQL Metrics: share exemplars between generator and backend

4b72208

This will reduce load on generators

ruslan-mikhailov force-pushed the exemplars-improvements branch from f7d231c to 4b72208 Compare June 2, 2025 08:37

ruslan-mikhailov added 2 commits June 2, 2025 10:46

Add comment with link to bug

2d6ecb1

minor linter fixes

6b9bafe

ruslan-mikhailov force-pushed the exemplars-improvements branch from 833a0e8 to 6b9bafe Compare June 2, 2025 08:46

ruslan-mikhailov merged commit a75b0bb into grafana:main Jun 2, 2025
20 checks passed

ruslan-mikhailov deleted the exemplars-improvements branch June 2, 2025 09:52

ruslan-mikhailov mentioned this pull request Jun 2, 2025

Exemplars missing for TraceQL metrics with ranges in the past #5114

Closed

ruslan-mikhailov mentioned this pull request Jun 2, 2025

Exemplars UX improvements (#5158) #5199

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Exemplars UX improvements #5158

Exemplars UX improvements #5158

Uh oh!

ruslan-mikhailov commented May 23, 2025 •

edited

Loading

Uh oh!

mdisibio May 23, 2025

Uh oh!

ruslan-mikhailov May 28, 2025

Uh oh!

knylander-grafana commented May 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

mapno left a comment

Uh oh!

ruslan-mikhailov commented Jun 2, 2025

Uh oh!

ruslan-mikhailov commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

Exemplars UX improvements #5158

Exemplars UX improvements #5158

Uh oh!

Conversation

ruslan-mikhailov commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdisibio May 23, 2025

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov May 28, 2025

Choose a reason for hiding this comment

Uh oh!

knylander-grafana commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mapno left a comment

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov commented Jun 2, 2025

Uh oh!

ruslan-mikhailov commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

ruslan-mikhailov commented May 23, 2025 •

edited

Loading

knylander-grafana commented May 23, 2025 •

edited

Loading