Reduce chunk write queue memory usage 1 #10873

pstibrany · 2022-06-16T08:29:35Z

This avoids wasting memory on the c.chunkRefMap by re-initializing it regularly. When re-initializing it, it gets initialized with a size which is half of the peak usage of the time period since the last re-init event, for this we also track the peak usage and reset it on every re-init event.

Very frequent re-initialization of the map would cause unnecessary allocations, to avoid that there are two factors which limit the frequency of the re-initializations:

There is a minimum interval of 10min between re-init events
In order to re-init the map the recorded peak usage since the last re-init event must be at least 1000 (objects in c.chunkRefMap).

When re-initializing it we initialize it to half of the peak usage since the last re-init event to try to hit the sweet spot in the trade-off between initializing it to a very low size potentially resulting in many allocations to grow it, and initializing it to a large size potentially resulting in unused allocated memory.

With this solution we have the following advantages:

If a tenant's number of active series decreases over time then we want that their queue size also shrinks over time. By always resetting it to half of the previous peak it will shrink together with the usage over time
We don't want to initialize it to a size of 0 because this would cause a lot of allocations to grow it back to the size which it actually needs. By initializing it to half of the previous peak it will rarely have to be grown to more than double of the initialized size.
We don't want to initialize it too frequently because that would also cause avoidable allocations, so there is a minimum interval of 10min between re-init events

This PR comes from grafana/mimir-prometheus#131. We use it in Grafana Mimir to reduce memory usage in Mimir clusters with thousands of open TSDBs in a single process.

Related to #10874.

* dont waste space on the chunkRefMap * add time factor * add comments * better readability * add instrumentation and more comments * formatting * uppercase comments * Address review feedback. Renamed "free" to "shrink" everywhere, updated comments and threshold to 1000. * double space Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>

pstibrany · 2022-06-16T08:46:10Z

Before this change, we have seen this cumulative memory usage across pods in single Mimir deployment in GB.

Legend:

zone-a using new chunk mapper without fix from this PR
zone-b and zone-c are using old chunk mapper before Write chunks via queue, predicting the refs #10051

After applying change from this PR into zone-b and enabling new chunk mapper there, we can see that memory usage has improved compared to zone-a:

Legend:

zone-a using new chunk mapper without fix from this PR
zone-b using new chunk mapper with the fix from this PR
zone-c using old chunk mapper before Write chunks via queue, predicting the refs #10051

Note that memory usage in zone-b still isn't the same as with old chunk mapper (zone-c), but second part is addressed by #10874

(In this case, each zone had 20 TSDBs open altogether across all pods)

codesome

Makes sense. Thanks!

bboreham · 2022-06-21T09:16:20Z

Could you say which metric is being graphed, please? ~~(working set?)~~

pstibrany · 2022-06-21T09:26:29Z

Query is:

sum by (zone) (label_replace(go_memstats_heap_inuse_bytes{job=~"cortex-dev-01/ingester.*"}, "zone", "$1", "pod", "(ingester-zone-.)-\\d+")) / 1e9

With pod names like ingester-zone-<a|b|c>-<number>.

* dont waste space on the chunkRefMap * add time factor * add comments * better readability * add instrumentation and more comments * formatting * uppercase comments * Address review feedback. Renamed "free" to "shrink" everywhere, updated comments and threshold to 1000. * double space Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>

pstibrany requested a review from codesome as a code owner June 16, 2022 08:29

pstibrany mentioned this pull request Jun 16, 2022

Reduce chunk write queue memory usage grafana/mimir-prometheus#131

Merged

pstibrany changed the title ~~Reduce chunk write queue memory usage (#131)~~ Reduce chunk write queue memory usage 1 Jun 16, 2022

pstibrany mentioned this pull request Jun 16, 2022

Reduce chunk write queue memory usage 2 #10874

Merged

codesome approved these changes Jun 17, 2022

View reviewed changes

codesome merged commit 03a2313 into prometheus:main Jun 17, 2022

philipgough mentioned this pull request Aug 3, 2022

receive: Expose write chunk queue as flag thanos-io/thanos#5566

Merged

2 tasks

eyenx mentioned this pull request Sep 26, 2022

chore(kps): update to 40.1.* adfinis/helm-charts#816

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce chunk write queue memory usage 1 #10873

Reduce chunk write queue memory usage 1 #10873

Uh oh!

pstibrany commented Jun 16, 2022 •

edited

Loading

Uh oh!

pstibrany commented Jun 16, 2022 •

edited

Loading

Uh oh!

codesome left a comment

Uh oh!

bboreham commented Jun 21, 2022 •

edited

Loading

Uh oh!

pstibrany commented Jun 21, 2022

Uh oh!

Uh oh!

Reduce chunk write queue memory usage 1 #10873

Reduce chunk write queue memory usage 1 #10873

Uh oh!

Conversation

pstibrany commented Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pstibrany commented Jun 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codesome left a comment

Choose a reason for hiding this comment

Uh oh!

bboreham commented Jun 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pstibrany commented Jun 21, 2022

Uh oh!

Uh oh!

pstibrany commented Jun 16, 2022 •

edited

Loading

pstibrany commented Jun 16, 2022 •

edited

Loading

bboreham commented Jun 21, 2022 •

edited

Loading