Issue 4896 traceql metrics fixes #4962

ruslan-mikhailov · 2025-04-07T15:32:30Z

What this PR does

Start and end of the trace in generator are converted to seconds, dropping the fractional part.

// Convert from unix nanos to unix seconds
startSeconds := uint32(start / uint64(time.Second))
endSeconds := uint32(end / uint64(time.Second))

err := p.headBlock.AppendTrace(id, tr, startSeconds, endSeconds, p.Cfg.AdjustTimeRangeForSlack)

Additionally, blocks' start and end times are in seconds.

Bug 1

Search in blocks was exclusive, but it should have included both edges. See

tempo/modules/frontend/frontend.go

Line 361 in a1ab3f9

    
           func blockMetasForSearch(allBlocks []*backend.BlockMeta, start, end time.Time, filterFn func(m *backend.BlockMeta) bool) []*backend.BlockMeta {

as a reference

Bug 2. Losing the last trace in the block.

It filters here:

			if e.checkTime {
				st := s.StartTimeUnixNanos()
				if st < e.start || st >= e.end {
					continue
				}
			}

To fix it, need to include the right border, but only when request's time range fully includes the right edge of the block

Bug 3. Lonely trace in a block.

It is possible to have a block with start==end, it can be reproduced if only one trace was pushed and flushed.

This results in skipping this block

func (e *MetricsEvaluator) Do(ctx context.Context, f SpansetFetcher, fetcherStart, fetcherEnd uint64) error {
	// Make a copy of the request so we can modify it.
	storageReq := *e.storageReq

	if fetcherStart > 0 && fetcherEnd > 0 &&
		fetcherStart != fetcherEnd {
			overlap := timeRangeOverlap(e.start, e.end, fetcherStart, fetcherEnd)
			if overlap == 0.0 { // <------ overlap will be zero
				// This shouldn't happen but might as well check.
				// No overlap == nothing to do
				return nil
			}

Bug 4. Exemplars are not filtered

		for _, s := range ss.Spans {
			if e.checkTime {
				st := s.StartTimeUnixNanos()
				if st < e.start || st >= e.end {
					continue
				}
			}

			e.spansTotal++

			e.metricsPipeline.observe(s)
		}

		if len(ss.Spans) > 0 && e.sampleExemplar(ss.TraceID) {
			e.metricsPipeline.observeExemplar(ss.Spans[rand.Intn(len(ss.Spans))])
		}

This results in including traces outside of the requested time range

Which issue(s) this PR fixes

Fixes #4896

How it has been tested

Vulture

It was checked by #4886

one binary docker compose (https://github.com/grafana/tempo/tree/main/example/docker-compose/local)
distributed docker compose (https://github.com/grafana/tempo/tree/main/example/docker-compose/distributed)
rhythm docker compose (https://github.com/grafana/tempo/tree/main/example/docker-compose/ingest-storage)
long run in k8s with rhythm enabled

Autotest

e2e test for a single trace
unit test for 2 traces to test border conditions

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

mdisibio · 2025-04-07T20:16:45Z

pkg/traceql/engine_metrics.go

@@ -1047,7 +1054,8 @@ func (e *MetricsEvaluator) Do(ctx context.Context, f SpansetFetcher, fetcherStar

 		e.mtx.Lock()

-		for _, s := range ss.Spans {
+		filteredSpansIndex := make([]int, 0, len(ss.Spans)) // optimistic preallocation


Need to watch performance here of maintaining another slice for every span. A few thoughts: it is worth losing some exemplar "fairness" here in order to keep performance high if needed, and also check e.sampleExemplar earlier to skip over extra work when we already have enough exemplars. I have a feeling that it would be faster to iterate ss.Spans twice in a second pass looking for exemplars, than to populate another slice.

You're right, it gives ~10-20% degradation in CPU time. I'll change to more optimal solution

Benchmarks.

Benchmark function

func BenchmarkExperimentA(b *testing.B) { spans := make([]Span, 0, 1000) for i := 0; i < 1000; i++ { spans = append(spans, newMockSpan(nil).WithStartTime(uint64(i)).WithSpanString("foo", "bar").WithSpanFloat("kafka.lag", randFloat(1e10, 1e20)).WithDuration(100)) } req := &tempopb.QueryRangeRequest{ Start: uint64(1 * time.Second), End: uint64(3 * time.Second), Step: uint64(1 * time.Second), Query: "{ } | count_over_time()", } engine := NewEngine() e, _ := engine.CompileMetricsQueryRange(req, 0, 0, false) e.maxExemplars = 100000000000 for b.Loop() { targetA(e, spans) } }

BenchmarkExperimentOriginal-12 51366 22503 ns/op 0 B/op 0 allocs/op

Original

func targetOriginal(e *MetricsEvaluator, spans []Span) { for _, s := range spans { if e.checkTime { st := s.StartTimeUnixNanos() if st < e.start || st >= e.end { continue } } e.spansTotal++ e.metricsPipeline.observe(s) } if len(spans) > 0 && e.sampleExemplar(spans[0].ID()) { e.metricsPipeline.observeExemplar(spans[randv1.Intn(len(spans))]) } }

BenchmarkExperimentA-12 46334 24571 ns/op 8193 B/op 1 allocs/op

Current version

func targetA(e *MetricsEvaluator, spans []Span) { filteredSpansIndex := make([]int, 0, len(spans)) // optimistic preallocation for i, s := range spans { if e.checkTime { st := s.StartTimeUnixNanos() if st < e.start || st >= e.end { continue } } e.spansTotal++ filteredSpansIndex = append(filteredSpansIndex, i) e.metricsPipeline.observe(s) } if len(filteredSpansIndex) > 0 && e.sampleExemplar(spans[0].ID()) { i := filteredSpansIndex[randv1.Intn(len(filteredSpansIndex))] e.metricsPipeline.observeExemplar(spans[i]) } }

BenchmarkExperimentB-12 51210 22709 ns/op 0 B/op 0 allocs/op

PoC of optimised solution
Note that this solution does not guarantee a fair probability distribution
func targetB(e *MetricsEvaluator, spans []Span) { var validSpansCount int var randomSpanIndex int needExemplar := e.maxExemplars > 0 && e.sampleExemplar(spans[0].ID()) for _, s := range spans { if e.checkTime { st := s.StartTimeUnixNanos() if st < e.start || st >= e.end { continue } } e.spansTotal++ validSpansCount++ e.metricsPipeline.observe(s) } if needExemplar { for i, s := range spans { if e.checkTime { st := s.StartTimeUnixNanos() if st < e.start || st >= e.end { continue } if needExemplar && (validSpansCount == 1 || randv1.Intn(validSpansCount) == 0) { randomSpanIndex = i } } } // If we found valid spans and need to sample an exemplar, use the randomly selected span if validSpansCount > 0 && needExemplar { e.metricsPipeline.observeExemplar(spans[randomSpanIndex]) } } }

Applied reservoir sampling algorithm. Benchmark results:

// BenchmarkExperimentOriginal-12 538930 2220 ns/op 0 B/op 0 allocs/op
// BenchmarkExperimentA-12 480423 2505 ns/op 896 B/op 1 allocs/op
// BenchmarkExperimentB-12 468480 2255 ns/op 0 B/op 0 allocs/op
// BenchmarkExperimentC-12 526160 2271 ns/op 0 B/op 0 allocs/op

Let me know if I need to move the optimisation any further.

Source code of the benchmark

func targetC(e *MetricsEvaluator, spans []Span) { var validSpansCount int var randomSpanIndex int needExemplar := e.maxExemplars > 0 && e.sampleExemplar(spans[0].ID()) for i, s := range spans { if e.checkTime { st := s.StartTimeUnixNanos() if st < e.start || st >= e.end { continue } } validSpansCount++ e.metricsPipeline.observe(s) if !needExemplar { continue } if needExemplar && (validSpansCount == 1 || randv1.Intn(validSpansCount) == 0) { randomSpanIndex = i } } e.spansTotal += uint64(validSpansCount) if needExemplar && validSpansCount > 0 { e.metricsPipeline.observeExemplar(spans[randomSpanIndex]) } }

Experimentation proves that the probability distribution is flat:

>>> def pickme(limit): ... res = 0 ... for ii in range(limit): ... if random.randint(0, ii) == 0: ... res = ii ... return res ... >>> c = Counter() >>> for i in range(1000000): ... c[pickme(100)] += 1 ... >>> c Counter({27: 10378, 18: 10311, 79: 10307, 69: 10288, 67: 10278, 82: 10275, 92: 10259, 6: 10256, 73: 10243, 34: 10224, 83: 10219, 31: 10210, 43: 10210, 53: 10207, 21: 10199, 36: 10193, 14: 10187, 17: 10187, 46: 10183, 99: 10182, 98: 10170, 7: 10165, 10: 10160, 71: 10159, 4: 10156, 64: 10153, 76: 10151, 48: 10149, 35: 10145, 41: 10145, 59: 10141, 63: 10138, 51: 10137, 72: 10137, 29: 10137, 84: 10136, 26: 10135, 42: 10133, 13: 10131, 45: 10131, 39: 10124, 55: 10119, 38: 10119, 40: 10119, 85: 10115, 1: 10112, 81: 10104, 47: 10102, 5: 10101, 15: 10099, 89: 10095, 61: 10087, 23: 10086, 22: 10086, 86: 10083, 93: 10082, 95: 10082, 28: 10082, 16: 10080, 94: 10079, 96: 10077, 32: 10076, 97: 10074, 77: 10073, 37: 10071, 65: 10069, 60: 10066, 9: 10064, 30: 10061, 75: 10058, 74: 10054, 3: 10053, 57: 10050, 68: 10050, 25: 10049, 33: 10045, 62: 10042, 56: 10040, 49: 10038, 78: 10036, 8: 10036, 88: 10033, 19: 10025, 0: 10020, 2: 10011, 90: 10008, 54: 9998, 70: 9997, 50: 9991, 91: 9987, 44: 9977, 24: 9977, 12: 9965, 52: 9928, 11: 9921, 87: 9921, 80: 9914, 58: 9899, 66: 9861, 20: 9724})

Answer to #4962 (review) .

For a ~500mb block, the results are quite the same, the only significant difference is that the unoptimised solution requires more allocations. Baseline is code from main branch.

benchstat

goos: darwin goarch: arm64 pkg: github.com/grafana/tempo/tempodb/encoding/vparquet4 cpu: Apple M4 Pro │ main.txt │ optimised.txt │ unoptimised.txt │ │ sec/op │ sec/op vs base │ sec/op vs base │ BackendBlockQueryRange/{}_|_rate()/5-12 12.61 ± ∞ ¹ 12.51 ± ∞ ¹ ~ (p=1.000 n=1) ² 12.63 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 20.22 ± ∞ ¹ 19.46 ± ∞ ¹ ~ (p=1.000 n=1) ² 19.99 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 17.12 ± ∞ ¹ 16.50 ± ∞ ¹ ~ (p=1.000 n=1) ² 16.85 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 27.09 ± ∞ ¹ 26.27 ± ∞ ¹ ~ (p=1.000 n=1) ² 27.10 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 2.115 ± ∞ ¹ 2.020 ± ∞ ¹ ~ (p=1.000 n=1) ² 2.155 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 7.618 ± ∞ ¹ 8.031 ± ∞ ¹ ~ (p=1.000 n=1) ² 7.776 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{status=error}_|_rate()/5-12 2.276 ± ∞ ¹ 2.261 ± ∞ ¹ ~ (p=1.000 n=1) ² 2.291 ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 8.875 8.732 -1.61% 8.901 +0.29% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 │ main.txt │ optimised.txt │ unoptimised.txt │ │ MB_IO/op │ MB_IO/op vs base │ MB_IO/op vs base │ BackendBlockQueryRange/{}_|_rate()/5-12 22.58 ± ∞ ¹ 22.58 ± ∞ ¹ ~ (p=1.000 n=1) ² 22.58 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 27.13 ± ∞ ¹ 27.13 ± ∞ ¹ ~ (p=1.000 n=1) ² 27.13 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 23.16 ± ∞ ¹ 23.16 ± ∞ ¹ ~ (p=1.000 n=1) ² 23.16 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 24.83 ± ∞ ¹ 24.83 ± ∞ ¹ ~ (p=1.000 n=1) ² 24.83 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 23.16 ± ∞ ¹ 23.16 ± ∞ ¹ ~ (p=1.000 n=1) ² 23.16 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 70.74 ± ∞ ¹ 70.74 ± ∞ ¹ ~ (p=1.000 n=1) ² 70.74 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{status=error}_|_rate()/5-12 24.35 ± ∞ ¹ 24.35 ± ∞ ¹ ~ (p=1.000 n=1) ² 24.35 ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 28.16 28.16 +0.00% 28.16 +0.00% ¹ need >= 6 samples for confidence interval at level 0.95 ² all samples are equal │ main.txt │ optimised.txt │ unoptimised.txt │ │ spans/op │ spans/op vs base │ spans/op vs base │ BackendBlockQueryRange/{}_|_rate()/5-12 4.736M ± ∞ ¹ 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 4.736M ± ∞ ¹ 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 4.736M ± ∞ ¹ 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 4.736M ± ∞ ¹ 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 340.6k ± ∞ ¹ 340.6k ± ∞ ¹ ~ (p=1.000 n=1) ² 340.6k ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 3.262k ± ∞ ¹ 3.262k ± ∞ ¹ ~ (p=1.000 n=1) ² 3.262k ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{status=error}_|_rate()/5-12 124.2k ± ∞ ¹ 124.2k ± ∞ ¹ ~ (p=1.000 n=1) ² 124.2k ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 683.1k 683.1k +0.00% 683.1k +0.00% ¹ need >= 6 samples for confidence interval at level 0.95 ² all samples are equal │ main.txt │ optimised.txt │ unoptimised.txt │ │ spans/s │ spans/s vs base │ spans/s vs base │ BackendBlockQueryRange/{}_|_rate()/5-12 375.7k ± ∞ ¹ 378.6k ± ∞ ¹ ~ (p=1.000 n=1) ² 375.1k ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 234.2k ± ∞ ¹ 243.4k ± ∞ ¹ ~ (p=1.000 n=1) ² 236.9k ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 276.7k ± ∞ ¹ 287.0k ± ∞ ¹ ~ (p=1.000 n=1) ² 281.1k ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 174.8k ± ∞ ¹ 180.3k ± ∞ ¹ ~ (p=1.000 n=1) ² 174.8k ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 161.0k ± ∞ ¹ 168.6k ± ∞ ¹ ~ (p=1.000 n=1) ² 158.1k ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 428.2 ± ∞ ¹ 406.2 ± ∞ ¹ ~ (p=1.000 n=1) ² 419.5 ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{status=error}_|_rate()/5-12 54.55k ± ∞ ¹ 54.91k ± ∞ ¹ ~ (p=1.000 n=1) ² 54.19k ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 76.97k 78.23k +1.63% 76.75k -0.29% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 │ main.txt │ optimised.txt │ unoptimised.txt │ │ B/op │ B/op vs base │ B/op vs base │ BackendBlockQueryRange/{}_|_rate()/5-12 485.4Mi ± ∞ ¹ 485.1Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 536.8Mi ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 662.9Mi ± ∞ ¹ 659.0Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 707.2Mi ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 644.4Mi ± ∞ ¹ 640.2Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 688.3Mi ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 873.2Mi ± ∞ ¹ 869.8Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 915.7Mi ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 170.1Mi ± ∞ ¹ 177.4Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 179.8Mi ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 184.7Mi ± ∞ ¹ 184.7Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 176.1Mi ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{status=error}_|_rate()/5-12 79.23Mi ± ∞ ¹ 84.01Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 79.79Mi ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 332.6Mi 336.6Mi +1.20% 346.9Mi +4.29% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05 │ main.txt │ optimised.txt │ unoptimised.txt │ │ allocs/op │ allocs/op vs base │ allocs/op vs base │ BackendBlockQueryRange/{}_|_rate()/5-12 1.754M ± ∞ ¹ 1.753M ± ∞ ¹ ~ (p=1.000 n=1) ² 2.347M ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 3.145M ± ∞ ¹ 3.144M ± ∞ ¹ ~ (p=1.000 n=1) ² 3.744M ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 3.121M ± ∞ ¹ 3.118M ± ∞ ¹ ~ (p=1.000 n=1) ² 3.713M ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 5.822M ± ∞ ¹ 5.825M ± ∞ ¹ ~ (p=1.000 n=1) ² 6.417M ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 621.7k ± ∞ ¹ 621.5k ± ∞ ¹ ~ (p=1.000 n=1) ² 624.1k ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 151.6k ± ∞ ¹ 151.2k ± ∞ ¹ ~ (p=1.000 n=1) ² 154.7k ± ∞ ¹ ~ (p=1.000 n=1) ² BackendBlockQueryRange/{status=error}_|_rate()/5-12 127.9k ± ∞ ¹ 127.6k ± ∞ ¹ ~ (p=1.000 n=1) ² 188.1k ± ∞ ¹ ~ (p=1.000 n=1) ² geomean 1.027M 1.026M -0.09% 1.210M +17.79% ¹ need >= 6 samples for confidence interval at level 0.95 ² need >= 4 samples to detect a difference at alpha level 0.05

mdisibio

Good changes, and I like the sampling algorithm. It's hard to tell the overall impact to queries from the direct benchmarks numbers you have provided. Can you try doing a benchstat comparison using the BenchmarkBackendBlockQueryRange?

mdisibio · 2025-04-08T11:47:59Z

pkg/traceql/engine_metrics.go

 	wasInstant := end1-start1 == step

+	start2 := uint64(blockStart.UnixNano())
+	blockEnd = blockEnd.Add(time.Second) // Add 1 second to include the last second of the block


I think this is worth commenting a little more: blockEnd is rounded down to the nearest second and considered inclusive. The end result of this function is that end2 covers the same time range but in an exclusive nanosecond version.

Added a little bit more comments. Let me know if I need to explain in very detail

ruslan-mikhailov · 2025-04-08T13:17:14Z

Automation closed the PR, reopening

Fixes bug where under certain conditions last trace in a block is lost

ruslan-mikhailov · 2025-04-08T15:50:26Z

+ review fix and minor variable name changes

ruslan-mikhailov force-pushed the issue-4896_traceql-metrics-fix branch 2 times, most recently from f7bc27e to f294067 Compare April 7, 2025 15:56

[Bugfix] Generator: make search in blocks inclusive

ff36c9e

ruslan-mikhailov force-pushed the issue-4896_traceql-metrics-fix branch from f294067 to ca9f6b9 Compare April 7, 2025 19:01

ruslan-mikhailov mentioned this pull request Apr 7, 2025

TraceQL Metrics: Vulture checks #4886

Merged

3 tasks

ruslan-mikhailov force-pushed the issue-4896_traceql-metrics-fix branch from ca9f6b9 to 7c3c219 Compare April 7, 2025 19:46

ruslan-mikhailov changed the title ~~WIP: Issue 4896 traceql metrics fix~~ Issue 4896 traceql metrics fix Apr 7, 2025

mdisibio reviewed Apr 7, 2025

View reviewed changes

ruslan-mikhailov mentioned this pull request Apr 8, 2025

TraceQL Metrics lose traces #4896

Closed

ruslan-mikhailov marked this pull request as ready for review April 8, 2025 08:37

ruslan-mikhailov requested review from joe-elliott, mapno, yvrhdn, zalegrala, electron0zero, ie-pham, stoewer, javiermolinar and carles-grafana as code owners April 8, 2025 08:37

ruslan-mikhailov changed the title ~~Issue 4896 traceql metrics fix~~ Issue 4896 traceql metrics fixes Apr 8, 2025

ruslan-mikhailov requested a review from mdisibio April 8, 2025 10:20

ruslan-mikhailov force-pushed the issue-4896_traceql-metrics-fix branch from 73f313c to bc32619 Compare April 8, 2025 10:21

mdisibio reviewed Apr 8, 2025

View reviewed changes

ruslan-mikhailov closed this in #4886 Apr 8, 2025

ruslan-mikhailov reopened this Apr 8, 2025

ruslan-mikhailov added 4 commits April 8, 2025 17:27

[Bugfix] Generator: Fix overlapping intervals issue

202b440

Fixes bug where under certain conditions last trace in a block is lost

[Bugfix] Generator: fix issue with a block with one trace

4b2a583

[Bugfix] Generator: exemplars must satisfy time range filtration

3266f6a

Generator: e2e test for single block

b111a69

ruslan-mikhailov added 4 commits April 8, 2025 17:27

Generator: tests for processor

97019ff

Generator: shutdown processor in tests

cace3ed

Update changelog

11b73e8

Generator: Optimise exemplar picking algo

818a1ea

ruslan-mikhailov force-pushed the issue-4896_traceql-metrics-fix branch from bc32619 to 818a1ea Compare April 8, 2025 15:48

ruslan-mikhailov requested a review from mdisibio April 8, 2025 15:59

ruslan-mikhailov mentioned this pull request Apr 9, 2025

Issue 4632: TraceQL Metrics: respect exemplars parameter #4976

Closed

3 tasks

mdisibio approved these changes Apr 10, 2025

View reviewed changes

mdisibio merged commit f290734 into grafana:main Apr 10, 2025
15 checks passed

mdisibio mentioned this pull request Apr 10, 2025

Add support for topk and bottomk functions for TraceQL metrics #4646

Merged

3 tasks

joe-elliott mentioned this pull request Apr 11, 2025

TraceQL Metrics: Fields not correctly retrieved when rating with structural operators #4828

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue 4896 traceql metrics fixes #4962

Issue 4896 traceql metrics fixes #4962

Uh oh!

ruslan-mikhailov commented Apr 7, 2025 •

edited

Loading

Uh oh!

mdisibio Apr 7, 2025

Uh oh!

ruslan-mikhailov Apr 8, 2025 •

edited

Loading

Uh oh!

ruslan-mikhailov Apr 8, 2025 •

edited

Loading

Uh oh!

ruslan-mikhailov Apr 8, 2025 •

edited

Loading

Uh oh!

ruslan-mikhailov Apr 8, 2025

Uh oh!

mdisibio left a comment

Uh oh!

mdisibio Apr 8, 2025

Uh oh!

ruslan-mikhailov Apr 8, 2025

Uh oh!

ruslan-mikhailov commented Apr 8, 2025 •

edited

Loading

Uh oh!

ruslan-mikhailov commented Apr 8, 2025

Uh oh!

Uh oh!

Uh oh!

Issue 4896 traceql metrics fixes #4962

Issue 4896 traceql metrics fixes #4962

Uh oh!

Conversation

ruslan-mikhailov commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Bug 1

Bug 2. Losing the last trace in the block.

Bug 3. Lonely trace in a block.

Bug 4. Exemplars are not filtered

Which issue(s) this PR fixes

How it has been tested

Vulture

Autotest

Checklist

Uh oh!

mdisibio Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

mdisibio left a comment

Choose a reason for hiding this comment

Uh oh!

mdisibio Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

ruslan-mikhailov commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ruslan-mikhailov commented Apr 8, 2025

Uh oh!

Uh oh!

Uh oh!

ruslan-mikhailov commented Apr 7, 2025 •

edited

Loading

ruslan-mikhailov Apr 8, 2025 •

edited

Loading

ruslan-mikhailov Apr 8, 2025 •

edited

Loading

ruslan-mikhailov Apr 8, 2025 •

edited

Loading

ruslan-mikhailov commented Apr 8, 2025 •

edited

Loading