-
Notifications
You must be signed in to change notification settings - Fork 5.1k
stats: rework stat sink flushing to centralize counter latching #6996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change reworks how stats are flushed to sinks by removing the Source class and replacing it with a MetricSnapshot which provides constant stats for all sinks to flush with counters pre-latched. In addition to cleaning up the flushing logic to unify and constify it, this change has the side effect of performing counter latching every flush interval (even if there are no sinks), this fixing the below reported issue. Fixes #6924 Signed-off-by: Matt Klein <mklein@lyft.com>
Signed-off-by: Matt Klein <mklein@lyft.com>
Testing the issue reported in #6924, the rate of increments no longer has the dip and spike during a hot restart. The counter values aren't passed between hot restarts so if we look at absolute values in |
@fishcakez the reset to 0 is expected. Presumably your metrics system can deal with a counter reset to 0 and we don't need to persist the current value across hot restarts (since this is what happens during a full restart)? |
Yes, it can handle that just fine 👌 |
// fixfix comment about latching. | ||
// Create a snapshot and flush to all sinks. | ||
// NOTE: Even if there are no sinks, creating the snapshot has the important property that it | ||
// latches all counters on a periodic basis. The hot restart code assumes this is being |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice comment!
Signed-off-by: Matt Klein <mklein@lyft.com>
writer.write(fmt::format("{}.{}:{}|c{}", prefix_, getName(*counter), delta, | ||
buildTagStr(counter->tags()))); | ||
for (const auto& counter : snapshot.counters()) { | ||
if (counter.counter_.get().used()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counter.counter_->used() does not work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counter_
is a reference_wrapper
Signed-off-by: Matt Klein <mklein@lyft.com>
@jmarantz updated |
/retest |
🔨 rebuilding |
* master: (65 commits) proto: Add PATCH method to RequestMethod enum (envoyproxy#6737) exe: drop unused deps on zlib compressor code (envoyproxy#7022) coverage: fix some misc coverage (envoyproxy#7033) Enable proto schema for router_check_tool (envoyproxy#6992) stats: rework stat sink flushing to centralize counter latching (envoyproxy#6996) [test] convert lds api test config stubs to v2 (envoyproxy#7021) router: scoped rds (2c): implement scoped rds API (envoyproxy#6932) build: Add option for size-optimized binary (envoyproxy#6960) test: adding an integration test framework for file-based LDS (envoyproxy#6933) doc: update obsolete ref to api/XDS_PROTOCOL.md (envoyproxy#7002) dispatcher: faster runOnAllThreads (envoyproxy#7011) example: add csrf sandbox (envoyproxy#6805) fix syntax of gcov exclusion zone. (envoyproxy#7023) /runtime_modify: add support for query params in body (envoyproxy#6977) stats: Create stats for http codes with the symbol table. (envoyproxy#6733) health check: fix more fallout from inline deletion change (envoyproxy#6988) Max heap fix (envoyproxy#7016) Add support to unregister from lifecycle notifications (envoyproxy#6984) build spdy_core_alt_svc_wire_format (envoyproxy#7010) ext_authz: Make sure initiateCall only called once (envoyproxy#6949) ... Signed-off-by: Michael Puncel <mpuncel@squareup.com>
This change reworks how stats are flushed to sinks by removing the
Source class and replacing it with a MetricSnapshot which provides
constant stats for all sinks to flush with counters pre-latched.
In addition to cleaning up the flushing logic to unify and constify
it, this change has the side effect of performing counter latching
every flush interval (even if there are no sinks), this fixing
the below reported issue.
Fixes #6924
Risk Level: Low
Testing: Modified existing tests and new unit tests
Docs Changes: N/A
Release Notes: N/A