Skip to content

Rewrite traces using rebatching #4690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Feb 20, 2025
Merged

Rewrite traces using rebatching #4690

merged 15 commits into from
Feb 20, 2025

Conversation

stoewer
Copy link
Contributor

@stoewer stoewer commented Feb 13, 2025

What this PR does:
Rebatch traces in order to remove redundant ResourceSpans and ScopeSpans to reduces the number of rows on resource and scope level. This has shown to speed up queries with matching resource attributes and also reduces the overall file size.

Which issue(s) this PR fixes:
Fixes grafana/tempo-squad#548

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

joe-elliott and others added 11 commits February 17, 2025 08:25
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Will be implemented in separate branch that will not be merged
The functions test.MakeBatch and test.MakeTrace* now create
more unique ResourceSpans and ScopeSpans. This avoids unwanted
rebatching in tests
Test data now has additional attributes on resource level
that caused these tests to fail
@stoewer stoewer marked this pull request as ready for review February 17, 2025 00:52
@stoewer stoewer changed the title [WIP] Rewrite traces using rebatching Rewrite traces using rebatching Feb 17, 2025
Copy link
Contributor

@mapno mapno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice code, LGTM

Copy link
Collaborator

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice implementation. couple nits and only one "real" question.

thanks!

@@ -547,6 +547,8 @@ func traceToParquetWithMapping(id common.ID, tr *tempopb.Trace, ot *Trace, dedic
}
}

// modify trace by rebatching spans and assigning nested set values
rebatchTrace(ot)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also call this after combining here-ish?

https://github.com/grafana/tempo/blob/main/tempodb/encoding/vparquet4/combiner.go#L139

we seem to have three things we do every time we build a trace:

  • rebatch
  • nested set model
  • service stats

should we create some holistic function like "finalizeTrace" that accomplishes this? that way if we add more of this kind of processing we will have an easy place to put it?

func finalizeTrace() {
  rebatch()
  assignNestedSetModel()
  ...
}

}

// preallocate a map and a slice to collect the indices of unique ResourceSpans and ScopeSpans
uniqueIndexes := make([]int, 0, max(len(trace.ResourceSpans), len(trace.ResourceSpans[0].ScopeSpans)))
Copy link
Collaborator

@joe-elliott joe-elliott Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took me a bit to figure out what you were using this. appreciate the effort on the in place alg. clean and well implemented.

@stoewer stoewer merged commit 40c4351 into grafana:main Feb 20, 2025
14 checks passed
@stoewer stoewer deleted the rewrite-traces branch March 13, 2025 06:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants