-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Remove global metrics provider and unify workqueue metric naming #40884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove global metrics provider and unify workqueue metric naming #40884
Conversation
9e4402e
to
acf821d
Compare
acf821d
to
e481aa5
Compare
/test |
e481aa5
to
1a7e1ed
Compare
1a7e1ed
to
7526fee
Compare
/test |
Yeah, great idea, changed it to this and was able to drop all the test refactoring. |
7526fee
to
aaa1cab
Compare
/test |
/test |
This global provider was always problematic: * Caused operator metrics to use the Agent cilium_* namespace. * Global register was brittle and relied on implicit ordering of init() functions. This removes the global provider and instead opts to provide one via hive. As well, any k8s watcher workqueue configs without a explicit config (i.e. relying on the global RegisterProvider) will now use the provided provider. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
These are now fixed to always be under the correct namespace. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
aaa1cab
to
98d3391
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my codeowners.
The way we registered the k8s global work-queue metrics has always been prone to breaking, as it depended on the specific ordering of evaluation of init functions, as is explained in this comment:
cilium/pkg/k8s/watchers/metrics.go
Lines 20 to 31 in 6c82bc4
This was already causing problems in the Operator resulting in missing metrics when trying to move the
watchers/metrics.go
file to a separate package. Clearly trying to trick Go into evaluating our init funcs before a code libraries wasn't a good approach moving forward.As a result, recent changes have moved operator related workqueue configs to explicitly define the provider via a global variable. This fixes the problem of the
init()
registry race condition but it results in some Operator workqueue metrics now being prefixed with the agent metrics namespace (i.e.cilium_
). Previously this was broken as well as because metrics registered via the global registry were not registered with the cilium prefix many such metrics were missing thecilium_
prefix, leading to inconsistent metric naming.This follows up those changes and takes another step in cleaning all this up by removing the global MetricsProvider and explicitly passing a Hive provided one to each workqueue (i.e. notably, many of the resource watchers use these to setup K8s object events). The result is a wide refactor requiring a lot of fixed tests and addition of MetricsProvider to a lot of Hive constructor parameters - thus the separate pull request.
https://github.com/cilium/cilium/pull/40884/files#diff-8b8e02278a63139306486adc431761b5b2dd9847df5559f95acd90d9a921fb6bL44-L49