Skip to content

Compactor pooling changes #4985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 11, 2025
Merged

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Apr 11, 2025

What this PR does:
Compactor memory usage and frequency of OOMs has increased over time and investigation showed that almost all of it is via the parquet row pool. It's been hard to ascertain the exact reasons, but the pooling slices for large traces is not providing the benefit that it used to. It seems that the large amount of pooled data no longer gets GC'ed like expected and accumulates. I couldn't spot any bugs in repooling, so I think it is a change in behavior at runtime due to a combination of factors: vParquet4 event/link attribute columns significantly increasing row sizes vs the prior json blobs, golang runtime, and environmental factors.

This PR removes pooling for large traces, and it has been tested to have a huge improvement in memory, cutting memory high-water marks to less than half in a high traffic/tenant install.

Some amount of pooling is needed to keep cpu, throughput, and GC rate maintained. Now we have a fixed 1M default allocation size, which covers most traces.

Initial Finding
This profile prior to an OOM showed the row pool was holding 85% of inuse memory:
image

Graph
image

Benchmarks
Our synthetic benchmarks of course show the large increase in allocs, and don't do a good job of capturing the real-world performance. An area for improvement.

pkg: github.com/grafana/tempo/tempodb/encoding/vparquet4
cpu: Apple M4 Pro
                    │ before.txt │             after.txt             │
                    │   sec/op   │   sec/op    vs base               │
Compactor/Small-14    3.693 ± 2%   3.666 ± 1%        ~ (p=0.818 n=6)
Compactor/Medium-14   3.410 ± 1%   3.698 ± 1%   +8.47% (p=0.002 n=6)
Compactor/Large-14    3.115 ± 1%   4.016 ± 6%  +28.93% (p=0.002 n=6)
geomean               3.398        3.790       +11.55%

                    │  before.txt   │               after.txt               │
                    │     B/op      │     B/op       vs base                │
Compactor/Small-14    2.067Gi ±  2%    1.811Gi ± 2%   -12.41% (p=0.002 n=6)
Compactor/Medium-14   2.152Gi ± 13%   10.526Gi ± 0%  +389.18% (p=0.002 n=6)
Compactor/Large-14    3.609Gi ±  5%   18.665Gi ± 1%  +417.12% (p=0.002 n=6)
geomean               2.523Gi          7.086Gi       +180.87%

                    │ before.txt  │             after.txt             │
                    │  allocs/op  │  allocs/op   vs base              │
Compactor/Small-14    147.2k ± 1%   147.5k ± 1%       ~ (p=0.240 n=6)
Compactor/Medium-14   68.00k ± 0%   70.42k ± 0%  +3.56% (p=0.002 n=6)
Compactor/Large-14    55.23k ± 0%   58.10k ± 1%  +5.20% (p=0.002 n=6)
geomean               82.08k        84.51k       +2.97%

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@mdisibio mdisibio merged commit ab878d4 into grafana:main Apr 11, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants