-
Notifications
You must be signed in to change notification settings - Fork 601
Issue 4896 traceql metrics fixes #4962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 4896 traceql metrics fixes #4962
Conversation
f7bc27e
to
f294067
Compare
f294067
to
ca9f6b9
Compare
ca9f6b9
to
7c3c219
Compare
pkg/traceql/engine_metrics.go
Outdated
@@ -1047,7 +1054,8 @@ func (e *MetricsEvaluator) Do(ctx context.Context, f SpansetFetcher, fetcherStar | |||
|
|||
e.mtx.Lock() | |||
|
|||
for _, s := range ss.Spans { | |||
filteredSpansIndex := make([]int, 0, len(ss.Spans)) // optimistic preallocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to watch performance here of maintaining another slice for every span. A few thoughts: it is worth losing some exemplar "fairness" here in order to keep performance high if needed, and also check e.sampleExemplar
earlier to skip over extra work when we already have enough exemplars. I have a feeling that it would be faster to iterate ss.Spans
twice in a second pass looking for exemplars, than to populate another slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, it gives ~10-20% degradation in CPU time. I'll change to more optimal solution
Benchmarks.
Benchmark function
func BenchmarkExperimentA(b *testing.B) {
spans := make([]Span, 0, 1000)
for i := 0; i < 1000; i++ {
spans = append(spans, newMockSpan(nil).WithStartTime(uint64(i)).WithSpanString("foo", "bar").WithSpanFloat("kafka.lag", randFloat(1e10, 1e20)).WithDuration(100))
}
req := &tempopb.QueryRangeRequest{
Start: uint64(1 * time.Second),
End: uint64(3 * time.Second),
Step: uint64(1 * time.Second),
Query: "{ } | count_over_time()",
}
engine := NewEngine()
e, _ := engine.CompileMetricsQueryRange(req, 0, 0, false)
e.maxExemplars = 100000000000
for b.Loop() {
targetA(e, spans)
}
}
BenchmarkExperimentOriginal-12 51366 22503 ns/op 0 B/op 0 allocs/op
Original
func targetOriginal(e *MetricsEvaluator, spans []Span) {
for _, s := range spans {
if e.checkTime {
st := s.StartTimeUnixNanos()
if st < e.start || st >= e.end {
continue
}
}
e.spansTotal++
e.metricsPipeline.observe(s)
}
if len(spans) > 0 && e.sampleExemplar(spans[0].ID()) {
e.metricsPipeline.observeExemplar(spans[randv1.Intn(len(spans))])
}
}
BenchmarkExperimentA-12 46334 24571 ns/op 8193 B/op 1 allocs/op
Current version
func targetA(e *MetricsEvaluator, spans []Span) {
filteredSpansIndex := make([]int, 0, len(spans)) // optimistic preallocation
for i, s := range spans {
if e.checkTime {
st := s.StartTimeUnixNanos()
if st < e.start || st >= e.end {
continue
}
}
e.spansTotal++
filteredSpansIndex = append(filteredSpansIndex, i)
e.metricsPipeline.observe(s)
}
if len(filteredSpansIndex) > 0 && e.sampleExemplar(spans[0].ID()) {
i := filteredSpansIndex[randv1.Intn(len(filteredSpansIndex))]
e.metricsPipeline.observeExemplar(spans[i])
}
}
BenchmarkExperimentB-12 51210 22709 ns/op 0 B/op 0 allocs/op
PoC of optimised solution
Note that this solution does not guarantee a fair probability distributionfunc targetB(e *MetricsEvaluator, spans []Span) {
var validSpansCount int
var randomSpanIndex int
needExemplar := e.maxExemplars > 0 && e.sampleExemplar(spans[0].ID())
for _, s := range spans {
if e.checkTime {
st := s.StartTimeUnixNanos()
if st < e.start || st >= e.end {
continue
}
}
e.spansTotal++
validSpansCount++
e.metricsPipeline.observe(s)
}
if needExemplar {
for i, s := range spans {
if e.checkTime {
st := s.StartTimeUnixNanos()
if st < e.start || st >= e.end {
continue
}
if needExemplar && (validSpansCount == 1 || randv1.Intn(validSpansCount) == 0) {
randomSpanIndex = i
}
}
}
// If we found valid spans and need to sample an exemplar, use the randomly selected span
if validSpansCount > 0 && needExemplar {
e.metricsPipeline.observeExemplar(spans[randomSpanIndex])
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied reservoir sampling algorithm. Benchmark results:
// BenchmarkExperimentOriginal-12 538930 2220 ns/op 0 B/op 0 allocs/op
// BenchmarkExperimentA-12 480423 2505 ns/op 896 B/op 1 allocs/op
// BenchmarkExperimentB-12 468480 2255 ns/op 0 B/op 0 allocs/op
// BenchmarkExperimentC-12 526160 2271 ns/op 0 B/op 0 allocs/op
Let me know if I need to move the optimisation any further.
Source code of the benchmark
func targetC(e *MetricsEvaluator, spans []Span) {
var validSpansCount int
var randomSpanIndex int
needExemplar := e.maxExemplars > 0 && e.sampleExemplar(spans[0].ID())
for i, s := range spans {
if e.checkTime {
st := s.StartTimeUnixNanos()
if st < e.start || st >= e.end {
continue
}
}
validSpansCount++
e.metricsPipeline.observe(s)
if !needExemplar {
continue
}
if needExemplar && (validSpansCount == 1 || randv1.Intn(validSpansCount) == 0) {
randomSpanIndex = i
}
}
e.spansTotal += uint64(validSpansCount)
if needExemplar && validSpansCount > 0 {
e.metricsPipeline.observeExemplar(spans[randomSpanIndex])
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Experimentation proves that the probability distribution is flat:
>>> def pickme(limit):
... res = 0
... for ii in range(limit):
... if random.randint(0, ii) == 0:
... res = ii
... return res
...
>>> c = Counter()
>>> for i in range(1000000):
... c[pickme(100)] += 1
...
>>> c
Counter({27: 10378, 18: 10311, 79: 10307, 69: 10288, 67: 10278, 82: 10275, 92: 10259, 6: 10256, 73: 10243, 34: 10224, 83: 10219, 31: 10210, 43: 10210, 53: 10207, 21: 10199, 36: 10193, 14: 10187, 17: 10187, 46: 10183, 99: 10182, 98: 10170, 7: 10165, 10: 10160, 71: 10159, 4: 10156, 64: 10153, 76: 10151, 48: 10149, 35: 10145, 41: 10145, 59: 10141, 63: 10138, 51: 10137, 72: 10137, 29: 10137, 84: 10136, 26: 10135, 42: 10133, 13: 10131, 45: 10131, 39: 10124, 55: 10119, 38: 10119, 40: 10119, 85: 10115, 1: 10112, 81: 10104, 47: 10102, 5: 10101, 15: 10099, 89: 10095, 61: 10087, 23: 10086, 22: 10086, 86: 10083, 93: 10082, 95: 10082, 28: 10082, 16: 10080, 94: 10079, 96: 10077, 32: 10076, 97: 10074, 77: 10073, 37: 10071, 65: 10069, 60: 10066, 9: 10064, 30: 10061, 75: 10058, 74: 10054, 3: 10053, 57: 10050, 68: 10050, 25: 10049, 33: 10045, 62: 10042, 56: 10040, 49: 10038, 78: 10036, 8: 10036, 88: 10033, 19: 10025, 0: 10020, 2: 10011, 90: 10008, 54: 9998, 70: 9997, 50: 9991, 91: 9987, 44: 9977, 24: 9977, 12: 9965, 52: 9928, 11: 9921, 87: 9921, 80: 9914, 58: 9899, 66: 9861, 20: 9724})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answer to #4962 (review) .
For a ~500mb block, the results are quite the same, the only significant difference is that the unoptimised solution requires more allocations. Baseline is code from main branch.
benchstat
goos: darwin
goarch: arm64
pkg: github.com/grafana/tempo/tempodb/encoding/vparquet4
cpu: Apple M4 Pro
│ main.txt │ optimised.txt │ unoptimised.txt │
│ sec/op │ sec/op vs base │ sec/op vs base │
BackendBlockQueryRange/{}_|_rate()/5-12 12.61 ± ∞ ¹ 12.51 ± ∞ ¹ ~ (p=1.000 n=1) ² 12.63 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 20.22 ± ∞ ¹ 19.46 ± ∞ ¹ ~ (p=1.000 n=1) ² 19.99 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 17.12 ± ∞ ¹ 16.50 ± ∞ ¹ ~ (p=1.000 n=1) ² 16.85 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 27.09 ± ∞ ¹ 26.27 ± ∞ ¹ ~ (p=1.000 n=1) ² 27.10 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 2.115 ± ∞ ¹ 2.020 ± ∞ ¹ ~ (p=1.000 n=1) ² 2.155 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 7.618 ± ∞ ¹ 8.031 ± ∞ ¹ ~ (p=1.000 n=1) ² 7.776 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{status=error}_|_rate()/5-12 2.276 ± ∞ ¹ 2.261 ± ∞ ¹ ~ (p=1.000 n=1) ² 2.291 ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 8.875 8.732 -1.61% 8.901 +0.29%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
│ main.txt │ optimised.txt │ unoptimised.txt │
│ MB_IO/op │ MB_IO/op vs base │ MB_IO/op vs base │
BackendBlockQueryRange/{}_|_rate()/5-12 22.58 ± ∞ ¹ 22.58 ± ∞ ¹ ~ (p=1.000 n=1) ² 22.58 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 27.13 ± ∞ ¹ 27.13 ± ∞ ¹ ~ (p=1.000 n=1) ² 27.13 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 23.16 ± ∞ ¹ 23.16 ± ∞ ¹ ~ (p=1.000 n=1) ² 23.16 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 24.83 ± ∞ ¹ 24.83 ± ∞ ¹ ~ (p=1.000 n=1) ² 24.83 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 23.16 ± ∞ ¹ 23.16 ± ∞ ¹ ~ (p=1.000 n=1) ² 23.16 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 70.74 ± ∞ ¹ 70.74 ± ∞ ¹ ~ (p=1.000 n=1) ² 70.74 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{status=error}_|_rate()/5-12 24.35 ± ∞ ¹ 24.35 ± ∞ ¹ ~ (p=1.000 n=1) ² 24.35 ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 28.16 28.16 +0.00% 28.16 +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
│ main.txt │ optimised.txt │ unoptimised.txt │
│ spans/op │ spans/op vs base │ spans/op vs base │
BackendBlockQueryRange/{}_|_rate()/5-12 4.736M ± ∞ ¹ 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 4.736M ± ∞ ¹ 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 4.736M ± ∞ ¹ 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 4.736M ± ∞ ¹ 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ² 4.736M ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 340.6k ± ∞ ¹ 340.6k ± ∞ ¹ ~ (p=1.000 n=1) ² 340.6k ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 3.262k ± ∞ ¹ 3.262k ± ∞ ¹ ~ (p=1.000 n=1) ² 3.262k ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{status=error}_|_rate()/5-12 124.2k ± ∞ ¹ 124.2k ± ∞ ¹ ~ (p=1.000 n=1) ² 124.2k ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 683.1k 683.1k +0.00% 683.1k +0.00%
¹ need >= 6 samples for confidence interval at level 0.95
² all samples are equal
│ main.txt │ optimised.txt │ unoptimised.txt │
│ spans/s │ spans/s vs base │ spans/s vs base │
BackendBlockQueryRange/{}_|_rate()/5-12 375.7k ± ∞ ¹ 378.6k ± ∞ ¹ ~ (p=1.000 n=1) ² 375.1k ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 234.2k ± ∞ ¹ 243.4k ± ∞ ¹ ~ (p=1.000 n=1) ² 236.9k ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 276.7k ± ∞ ¹ 287.0k ± ∞ ¹ ~ (p=1.000 n=1) ² 281.1k ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 174.8k ± ∞ ¹ 180.3k ± ∞ ¹ ~ (p=1.000 n=1) ² 174.8k ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 161.0k ± ∞ ¹ 168.6k ± ∞ ¹ ~ (p=1.000 n=1) ² 158.1k ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 428.2 ± ∞ ¹ 406.2 ± ∞ ¹ ~ (p=1.000 n=1) ² 419.5 ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{status=error}_|_rate()/5-12 54.55k ± ∞ ¹ 54.91k ± ∞ ¹ ~ (p=1.000 n=1) ² 54.19k ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 76.97k 78.23k +1.63% 76.75k -0.29%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
│ main.txt │ optimised.txt │ unoptimised.txt │
│ B/op │ B/op vs base │ B/op vs base │
BackendBlockQueryRange/{}_|_rate()/5-12 485.4Mi ± ∞ ¹ 485.1Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 536.8Mi ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 662.9Mi ± ∞ ¹ 659.0Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 707.2Mi ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 644.4Mi ± ∞ ¹ 640.2Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 688.3Mi ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 873.2Mi ± ∞ ¹ 869.8Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 915.7Mi ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 170.1Mi ± ∞ ¹ 177.4Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 179.8Mi ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 184.7Mi ± ∞ ¹ 184.7Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 176.1Mi ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{status=error}_|_rate()/5-12 79.23Mi ± ∞ ¹ 84.01Mi ± ∞ ¹ ~ (p=1.000 n=1) ² 79.79Mi ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 332.6Mi 336.6Mi +1.20% 346.9Mi +4.29%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
│ main.txt │ optimised.txt │ unoptimised.txt │
│ allocs/op │ allocs/op vs base │ allocs/op vs base │
BackendBlockQueryRange/{}_|_rate()/5-12 1.754M ± ∞ ¹ 1.753M ± ∞ ¹ ~ (p=1.000 n=1) ² 2.347M ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(name)/5-12 3.145M ± ∞ ¹ 3.144M ± ∞ ¹ ~ (p=1.000 n=1) ² 3.744M ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(resource.service.name)/5-12 3.121M ± ∞ ¹ 3.118M ± ∞ ¹ ~ (p=1.000 n=1) ² 3.713M ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{}_|_rate()_by_(span.http.url)/5-12 5.822M ± ∞ ¹ 5.825M ± ∞ ¹ ~ (p=1.000 n=1) ² 6.417M ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{resource.service.name=`loki-ingester`}_|_rate()/5-12 621.7k ± ∞ ¹ 621.5k ± ∞ ¹ ~ (p=1.000 n=1) ² 624.1k ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{span.http.host_!=_``_&&_span.http.flavor=`2`}_|_rate()_by_(span.http.flavor)/5-12 151.6k ± ∞ ¹ 151.2k ± ∞ ¹ ~ (p=1.000 n=1) ² 154.7k ± ∞ ¹ ~ (p=1.000 n=1) ²
BackendBlockQueryRange/{status=error}_|_rate()/5-12 127.9k ± ∞ ¹ 127.6k ± ∞ ¹ ~ (p=1.000 n=1) ² 188.1k ± ∞ ¹ ~ (p=1.000 n=1) ²
geomean 1.027M 1.026M -0.09% 1.210M +17.79%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
73f313c
to
bc32619
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good changes, and I like the sampling algorithm. It's hard to tell the overall impact to queries from the direct benchmarks numbers you have provided. Can you try doing a benchstat comparison using the BenchmarkBackendBlockQueryRange
?
pkg/traceql/engine_metrics.go
Outdated
wasInstant := end1-start1 == step | ||
|
||
start2 := uint64(blockStart.UnixNano()) | ||
blockEnd = blockEnd.Add(time.Second) // Add 1 second to include the last second of the block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is worth commenting a little more: blockEnd is rounded down to the nearest second and considered inclusive. The end result of this function is that end2
covers the same time range but in an exclusive nanosecond version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a little bit more comments. Let me know if I need to explain in very detail
Automation closed the PR, reopening |
Fixes bug where under certain conditions last trace in a block is lost
bc32619
to
818a1ea
Compare
+ review fix and minor variable name changes |
What this PR does
Start and end of the trace in generator are converted to seconds, dropping the fractional part.
Additionally, blocks' start and end times are in seconds.
Bug 1
Search in blocks was exclusive, but it should have included both edges. See
tempo/modules/frontend/frontend.go
Line 361 in a1ab3f9
Bug 2. Losing the last trace in the block.
It filters here:
To fix it, need to include the right border, but only when request's time range fully includes the right edge of the block
Bug 3. Lonely trace in a block.
It is possible to have a block with start==end, it can be reproduced if only one trace was pushed and flushed.
This results in skipping this block
Bug 4. Exemplars are not filtered
This results in including traces outside of the requested time range
Which issue(s) this PR fixes
Fixes #4896
How it has been tested
Vulture
It was checked by #4886
Autotest
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]