Skip to content

Conversation

rulex123
Copy link
Contributor

Commit Message: patch to add support for streaming/chunking of prometheus stats - basically adapts the algorithm introduced in #19693 for prometheus stats (while keeping common parts of the algorithm in a shared class to avoid code duplication).
With this patch we should be able to improve the issue described here #16139

Risk Level: only underlying implementation changes, but otherwise no change of behaviour/features. So low risk I would say.
Testing: current tests should remain unchanged (though they might be moved around due to refactoring of code); new tests are also added.

Docs Changes: n/a
Release Notes: n/a
[Optional Runtime guard:]
[Optional Fixes #Issue]: #16139

Draft PR (already went through some review) -> #24716

Benchmark data from running stats_handler_speed_test.cc on a dev box:

BEFORE

---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_AllCountersText                      744 ms          744 ms            1
BM_UsedCountersText                    76.5 ms         76.4 ms            9
BM_FilteredCountersText                2866 ms         2866 ms            1
BM_Re2FilteredCountersText              371 ms          371 ms            2
BM_AllCountersJson                      917 ms          917 ms            1
BM_UsedCountersJson                    79.0 ms         79.0 ms            9
BM_FilteredCountersJson                3087 ms         3086 ms            1
BM_Re2FilteredCountersJson              369 ms          369 ms            2
BM_AllCountersPrometheus               8125 ms         8124 ms            1
BM_UsedCountersPrometheus               334 ms          334 ms            2
BM_FilteredCountersPrometheus          3680 ms         3680 ms            1
BM_Re2FilteredCountersPrometheus       1022 ms         1022 ms            1

AFTER

---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
BM_AllCountersText                      742 ms          742 ms            1
BM_UsedCountersText                    77.6 ms         77.6 ms            9
BM_FilteredCountersText                2862 ms         2861 ms            1
BM_Re2FilteredCountersText              509 ms          508 ms            2
BM_AllCountersJson                     1559 ms         1558 ms            1
BM_UsedCountersJson                    81.0 ms         81.0 ms            9
BM_FilteredCountersJson                2857 ms         2856 ms            1
BM_Re2FilteredCountersJson              373 ms          373 ms            2
BM_AllCountersPrometheus               1842 ms         1842 ms            1
BM_UsedCountersPrometheus              92.4 ms         92.4 ms            8
BM_FilteredCountersPrometheus          2863 ms         2862 ms            1
BM_Re2FilteredCountersPrometheus        372 ms          372 ms            2

rulex123 and others added 30 commits January 18, 2023 17:38
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: rulex123 <erica.manno@gmail.com>
@adisuissa
Copy link
Contributor

@ggreenway can you PTAL?

Copy link
Member

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good. Thanks for making the test changes more minimal; that made reviewing much easier.

/wait

const std::string& prefixed_tag_extracted_name) {
auto parent_histogram = dynamic_cast<Stats::ParentHistogram*>(metric.get());
if (parent_histogram != nullptr) {
// TODO(rulex123): support for the 3 histogram "bucket modes" offered in other
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be a TODO. There is only one valid way to represent histograms in the prometheus exposition format, and this is it. I think you can just delete the TODO comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense.

FYI in #26472 I have added a new bucket mode tentatively called 'detailed' which will have the raw bucket data. That may also be useful for Prom as you can see a lot more interesting data with this IMO, but it might complicate aggregation in Prom across multiple Envoys.

Don't worry Greg -- that PR is still a ways away from being atomized into probably around 4 for review, but I think the first PR will be the addition of a new bucket-mode to show the original buckets computed by circlhist.

Nevertheless I agree removing this TODO is fine for now.

Signed-off-by: rulex123 <erica.manno@gmail.com>
@rulex123
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #24998 (comment) was created by @rulex123.

see: more, trace.

@ggreenway ggreenway merged commit 1cf0187 into envoyproxy:main May 1, 2023
@kyessenov kyessenov mentioned this pull request May 4, 2023
jmarantz added a commit that referenced this pull request May 6, 2023
jmarantz added a commit that referenced this pull request May 6, 2023
Commit Message: Reverts #24998

See #27173

Additional Description:
Risk Level: n/a
Testing: n/a
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a
jmarantz added a commit that referenced this pull request May 8, 2023
Commit Message: Reproduces a scenario where it's difficult to use a streaming optimization for Prometheus stats based on scopes without introducing a bug. Context:

Issue that /stats/prometheus endpoint takes too much much memory: Prometheus stats handler used too much memory. #16139
Solution for non-Prometheus endpoints to use less memory for /stats and run faster: admin: Streaming /stats implementation #19693 which I believe is working well.
This solution mapped to Prometheus: Prom stats perf improvements #24998
A case where this solution has a duplicate# TYPE line due to two stats with the same tag-extracted-stat name from two different scopes: stats: prometheus scrape failed #27173
Note that the existing unit tests did not exercise that scenario so this PR adds a testcase.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
rulex123 added a commit to rulex123/envoy that referenced this pull request Jun 28, 2023
(i.e. doesn't cater for stats with identical tag extracted name but from
different scopes).

Signed-off-by: rulex123 <erica.manno@gmail.com>
reskin89 pushed a commit to reskin89/envoy that referenced this pull request Jul 11, 2023
Add support for streaming/chunking of prometheus stats - basically adapts the algorithm introduced in envoyproxy#19693 for prometheus stats (while keeping common parts of the algorithm in a shared class to avoid code duplication).
With this patch we should be able to improve the issue described here envoyproxy#16139

Signed-off-by: rulex123 <erica.manno@gmail.com>
Signed-off-by: Ryan Eskin <ryan.eskin89@protonmail.com>
reskin89 pushed a commit to reskin89/envoy that referenced this pull request Jul 11, 2023
Commit Message: Reverts envoyproxy#24998

See envoyproxy#27173

Additional Description:
Risk Level: n/a
Testing: n/a
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Ryan Eskin <ryan.eskin89@protonmail.com>
reskin89 pushed a commit to reskin89/envoy that referenced this pull request Jul 11, 2023
…xy#27239)

Commit Message: Reproduces a scenario where it's difficult to use a streaming optimization for Prometheus stats based on scopes without introducing a bug. Context:

Issue that /stats/prometheus endpoint takes too much much memory: Prometheus stats handler used too much memory. envoyproxy#16139
Solution for non-Prometheus endpoints to use less memory for /stats and run faster: admin: Streaming /stats implementation envoyproxy#19693 which I believe is working well.
This solution mapped to Prometheus: Prom stats perf improvements envoyproxy#24998
A case where this solution has a duplicate# TYPE line due to two stats with the same tag-extracted-stat name from two different scopes: stats: prometheus scrape failed envoyproxy#27173
Note that the existing unit tests did not exercise that scenario so this PR adds a testcase.

Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Ryan Eskin <ryan.eskin89@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants