-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Title: Optimize admin stats memory use
Description:
There are a few related problems which I think can be tackled independently, all of which are related to memory consumption in stats endpoints when there are a large number of stats (e.g. due to a large number of clusters).
- The browser may crash if you send stats for 100k clusters. There's simply too much data especially with serialized names
- The server may crash due to implementations buffering all the stats to sort them and send them out in one chunk
- Prometheus admin stats handler cost too much memory as Prometheus stats handler used too much memory. #16139(buffer: add chunker #16591) said. Now all serialized bytes are always buffered in the admin handler, then they may also be buffered in the networking layer for a slow client.
A challenge with approaches around streaming data out from /stats or for Prometheus is that the data is held in unsorted hash-maps by the stats allocator/store, and we need to present fully sorted data to users looking at admin /stats, and collated data to Prometheus, due to tag grouping (I think...I'm not a Prometheus expert).
@pradeepcrao has enabled solutions to these by adding in forEach type accessors into the stats system, at least for counters, gauges, and text-readouts. Histograms still need to be done. To tackle the above 3 symptoms, I am experimenting with a paging algorithm here: https://github.com/jmarantz/envoy/blob/stats-stream/source/common/stats/filter.h . This provides a possibly-efficient-enough ( O (NumStats * Log(PageSize)) ) algorithm to use with forEachXXX
to get pages of sorted stats suitable for use with paging controls in a new flavor of admin page, e.g. /admin/stats?format=html
. It may also provide
Details refer to Google docs.
@jmarantz please take a look and add anything if I miss.