Skip to content

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Jun 16, 2022

This avoids wasting memory on the c.chunkRefMap by re-initializing it regularly. When re-initializing it, it gets initialized with a size which is half of the peak usage of the time period since the last re-init event, for this we also track the peak usage and reset it on every re-init event.

Very frequent re-initialization of the map would cause unnecessary allocations, to avoid that there are two factors which limit the frequency of the re-initializations:

  • There is a minimum interval of 10min between re-init events
  • In order to re-init the map the recorded peak usage since the last re-init event must be at least 1000 (objects in c.chunkRefMap).

When re-initializing it we initialize it to half of the peak usage since the last re-init event to try to hit the sweet spot in the trade-off between initializing it to a very low size potentially resulting in many allocations to grow it, and initializing it to a large size potentially resulting in unused allocated memory.

With this solution we have the following advantages:

  • If a tenant's number of active series decreases over time then we want that their queue size also shrinks over time. By always resetting it to half of the previous peak it will shrink together with the usage over time
  • We don't want to initialize it to a size of 0 because this would cause a lot of allocations to grow it back to the size which it actually needs. By initializing it to half of the previous peak it will rarely have to be grown to more than double of the initialized size.
  • We don't want to initialize it too frequently because that would also cause avoidable allocations, so there is a minimum interval of 10min between re-init events

This PR comes from grafana/mimir-prometheus#131. We use it in Grafana Mimir to reduce memory usage in Mimir clusters with thousands of open TSDBs in a single process.

Related to #10874.

* dont waste space on the chunkRefMap
* add time factor
* add comments
* better readability
* add instrumentation and more comments
* formatting
* uppercase comments
* Address review feedback. Renamed "free" to "shrink" everywhere, updated comments and threshold to 1000.
* double space

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
@pstibrany pstibrany requested a review from codesome as a code owner June 16, 2022 08:29
@pstibrany pstibrany changed the title Reduce chunk write queue memory usage (#131) Reduce chunk write queue memory usage 1 Jun 16, 2022
@pstibrany
Copy link
Contributor Author

pstibrany commented Jun 16, 2022

Before this change, we have seen this cumulative memory usage across pods in single Mimir deployment in GB.

Legend:

After applying change from this PR into zone-b and enabling new chunk mapper there, we can see that memory usage has improved compared to zone-a:

Legend:

Note that memory usage in zone-b still isn't the same as with old chunk mapper (zone-c), but second part is addressed by #10874

(In this case, each zone had 20 TSDBs open altogether across all pods)

Copy link
Member

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks!

@codesome codesome merged commit 03a2313 into prometheus:main Jun 17, 2022
@bboreham
Copy link
Member

bboreham commented Jun 21, 2022

Could you say which metric is being graphed, please? (working set?)

@pstibrany
Copy link
Contributor Author

Query is:

sum by (zone) (label_replace(go_memstats_heap_inuse_bytes{job=~"cortex-dev-01/ingester.*"}, "zone", "$1", "pod", "(ingester-zone-.)-\\d+")) / 1e9

With pod names like ingester-zone-<a|b|c>-<number>.

roidelapluie pushed a commit to roidelapluie/prometheus that referenced this pull request Jun 22, 2022
* dont waste space on the chunkRefMap
* add time factor
* add comments
* better readability
* add instrumentation and more comments
* formatting
* uppercase comments
* Address review feedback. Renamed "free" to "shrink" everywhere, updated comments and threshold to 1000.
* double space

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>

Co-authored-by: Mauro Stettler <mauro.stettler@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants