-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Reduce chunk write queue memory usage 1 #10873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* dont waste space on the chunkRefMap * add time factor * add comments * better readability * add instrumentation and more comments * formatting * uppercase comments * Address review feedback. Renamed "free" to "shrink" everywhere, updated comments and threshold to 1000. * double space Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Before this change, we have seen this cumulative memory usage across pods in single Mimir deployment in GB. Legend:
After applying change from this PR into zone-b and enabling new chunk mapper there, we can see that memory usage has improved compared to zone-a: Legend:
Note that memory usage in zone-b still isn't the same as with old chunk mapper (zone-c), but second part is addressed by #10874 (In this case, each zone had 20 TSDBs open altogether across all pods) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks!
Could you say which metric is being graphed, please? |
Query is:
With pod names like |
* dont waste space on the chunkRefMap * add time factor * add comments * better readability * add instrumentation and more comments * formatting * uppercase comments * Address review feedback. Renamed "free" to "shrink" everywhere, updated comments and threshold to 1000. * double space Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>
This avoids wasting memory on the
c.chunkRefMap
by re-initializing it regularly. When re-initializing it, it gets initialized with a size which is half of the peak usage of the time period since the last re-init event, for this we also track the peak usage and reset it on every re-init event.Very frequent re-initialization of the map would cause unnecessary allocations, to avoid that there are two factors which limit the frequency of the re-initializations:
1000
(objects inc.chunkRefMap
).When re-initializing it we initialize it to half of the peak usage since the last re-init event to try to hit the sweet spot in the trade-off between initializing it to a very low size potentially resulting in many allocations to grow it, and initializing it to a large size potentially resulting in unused allocated memory.
With this solution we have the following advantages:
This PR comes from grafana/mimir-prometheus#131. We use it in Grafana Mimir to reduce memory usage in Mimir clusters with thousands of open TSDBs in a single process.
Related to #10874.