-
Notifications
You must be signed in to change notification settings - Fork 597
Rewrite traces using rebatching #4690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8d0765b
to
afd14bf
Compare
Signed-off-by: Joe Elliott <number101010@gmail.com>
Will be implemented in separate branch that will not be merged
The functions test.MakeBatch and test.MakeTrace* now create more unique ResourceSpans and ScopeSpans. This avoids unwanted rebatching in tests
Test data now has additional attributes on resource level that caused these tests to fail
afd14bf
to
8ef32f6
Compare
1607e88
to
d50ab3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice code, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice implementation. couple nits and only one "real" question.
thanks!
tempodb/encoding/vparquet4/schema.go
Outdated
@@ -547,6 +547,8 @@ func traceToParquetWithMapping(id common.ID, tr *tempopb.Trace, ot *Trace, dedic | |||
} | |||
} | |||
|
|||
// modify trace by rebatching spans and assigning nested set values | |||
rebatchTrace(ot) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also call this after combining here-ish?
https://github.com/grafana/tempo/blob/main/tempodb/encoding/vparquet4/combiner.go#L139
we seem to have three things we do every time we build a trace:
- rebatch
- nested set model
- service stats
should we create some holistic function like "finalizeTrace" that accomplishes this? that way if we add more of this kind of processing we will have an easy place to put it?
func finalizeTrace() {
rebatch()
assignNestedSetModel()
...
}
} | ||
|
||
// preallocate a map and a slice to collect the indices of unique ResourceSpans and ScopeSpans | ||
uniqueIndexes := make([]int, 0, max(len(trace.ResourceSpans), len(trace.ResourceSpans[0].ScopeSpans))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took me a bit to figure out what you were using this. appreciate the effort on the in place alg. clean and well implemented.
What this PR does:
Rebatch traces in order to remove redundant
ResourceSpans
andScopeSpans
to reduces the number of rows on resource and scope level. This has shown to speed up queries with matching resource attributes and also reduces the overall file size.Which issue(s) this PR fixes:
Fixes grafana/tempo-squad#548
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]