-
Notifications
You must be signed in to change notification settings - Fork 597
Exemplars UX improvements #5158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exemplars UX improvements #5158
Conversation
74e9718
to
147cca9
Compare
@@ -228,8 +263,8 @@ func (s *queryRangeSharder) buildBackendRequests(ctx context.Context, tenantID s | |||
queryHash := hashForQueryRangeRequest(&searchReq) | |||
colsToJSON := api.NewDedicatedColumnsToJSON() | |||
|
|||
exemplarsPerBlock := s.exemplarsPerShard(uint32(len(metas)), searchReq.Exemplars) | |||
for _, m := range metas { | |||
exemplarsPerBlock := s.exemplarsPerShard(metas, searchReq.Exemplars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also scale the number of exemplars for generatorRequest based on the time range? I think it sends over the whole limit, which was also a reason exemplars seemed to clump in recent data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! With bucket fix, it still should distribute fairly, but by cutting the exemplars parameter, we can reduce the load to generators.
I pushed exemplar splitting between generator and querier in a separate commit. Could you please take a look?
@ruslan-mikhailov Do we need to update the docs so people understand the changes you've made in this PR? Maybe this doc? https://grafana.com/docs/tempo/latest/traceql/metrics-queries/#exemplars Also, have you seen this PR? #5129 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Determine the number of requested exemplars from a block based on its time range, not equally
UX improvement
This will reduce load on generators
f7d231c
to
4b72208
Compare
+ rebase from latest main |
+ linter fixes |
833a0e8
to
6b9bafe
Compare
* TraceQL Metrics: more fair exemplars distribution Determine the number of requested exemplars from a block based on its time range, not equally * TraceQL Metrics: distribute exemplars over time UX improvement * TraceQL Metrics: Parametrise exemplars limit * TraceQL Metrics: pass exemplars limit to combiners * TraceQL Metrics: hard limit exemplars * TraceQL Metrics: restore default behaviour * TraceQL Metrics: e2e test for number of returned exemplars * TraceQL Metrics: state bug in quantile_over_time * Changelog * Stabilise flaky test * nolint for integer overflow * minor linter fixes * TraceQL Metrics: share exemplars between generator and backend This will reduce load on generators * Add comment with link to bug * minor linter fixes
* TraceQL Metrics: more fair exemplars distribution Determine the number of requested exemplars from a block based on its time range, not equally * TraceQL Metrics: distribute exemplars over time UX improvement * TraceQL Metrics: Parametrise exemplars limit * TraceQL Metrics: pass exemplars limit to combiners * TraceQL Metrics: hard limit exemplars * TraceQL Metrics: restore default behaviour * TraceQL Metrics: e2e test for number of returned exemplars * TraceQL Metrics: state bug in quantile_over_time * Changelog * Stabilise flaky test * nolint for integer overflow * minor linter fixes * TraceQL Metrics: share exemplars between generator and backend This will reduce load on generators * Add comment with link to bug * minor linter fixes
* TraceQL Metrics: more fair exemplars distribution Determine the number of requested exemplars from a block based on its time range, not equally * TraceQL Metrics: distribute exemplars over time UX improvement * TraceQL Metrics: Parametrise exemplars limit * TraceQL Metrics: pass exemplars limit to combiners * TraceQL Metrics: hard limit exemplars * TraceQL Metrics: restore default behaviour * TraceQL Metrics: e2e test for number of returned exemplars * TraceQL Metrics: state bug in quantile_over_time * Changelog * Stabilise flaky test * nolint for integer overflow * minor linter fixes * TraceQL Metrics: share exemplars between generator and backend This will reduce load on generators * Add comment with link to bug * minor linter fixes
What this PR does:
a. Change the approach to calculating requested exemplars from frontend to querier. Instead of equal share, calculate it based on block time range. The biggest block will receive more exemplars than a small block
b. Change sampling algorithm by fixing number of sampling buckets
Before the change:

After the change:

Results:

Before:
After:

Which issue(s) this PR fixes:
Fixes #4856
Fixes #4632
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]