-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[v1.12] Author Backport of 28382 (Metrics associated with a deleted node should not be reported) #28977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1.12] Author Backport of 28382 (Metrics associated with a deleted node should not be reported) #28977
Conversation
/test-backport-1.12 |
cced45a
to
1b0dbe1
Compare
/test-backport-1.12 |
1 similar comment
/test-backport-1.12 |
…er reported. [ upstream commit e9f97cd ] When a node is deleted from a cluster, metrics associated with that node are still being exported to prometheus. Short of restarting the agent, we want to dynamically delete these metrics when a node is removed from the cluster. This PR ensures node_connectivity_status and node_connectivity_latency no longer report metrics for nodes that are no longer present on the cluster. [ Backporter's notes: Original PR was adapted! ] The original PR depends (mainly!) on 2 other PRs that haven't been backported and are fairly substential. Given this, I've opted to adapt the original implementation to surface the fix while minimizing impact with these updates: 1. pkg/metrics/interfaces did not introduce pkg/metrics/metric wrappers as of this release. Hence adapted deletableVec to use the current implementation. (Referring to commit: 84ea383) 2. pkg/node/manager/manager was adapted to provide for metrics deletion when a node is deleted. Subsequent PR refactored the manager metrics structure which the original PR used. (Referring to commit: c49ef45) 3. In order to pickup prom metrics vec delete feature github.com/prometheus/client_golang dep was bumped to v1.14.0 Signed-off-by: Fernand Galiana <fernand.galiana@isovalent.com>
1b0dbe1
to
5d05417
Compare
/test-backport-1.12 Edit: Both 4.9 jobs failed with flakes similar to
Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-kernel-4.9/225/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
The runtime tests all failed which might either suggest a VM provisioning failure or something in the privileged unit test suite is broken from changes in this PR. Edit: Looks like it was a VM provisioning failure. |
/test-1.16-4.9 Edit: #24840 |
/test-1.19-4.9 |
/test-runtime |
/test-1.16-4.9 |
Once this PR is merged, you can update the PR labels via: