Skip to content

[v1.12] Author Backport of 28382 (Metrics associated with a deleted node should not be reported) #28977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 8, 2023

Conversation

derailed
Copy link
Contributor

@derailed derailed commented Nov 3, 2023

Once this PR is merged, you can update the PR labels via:

$ for pr in 28382; do contrib/backporting/set-labels.py $pr done 1.12; done

@derailed derailed requested a review from a team as a code owner November 3, 2023 16:59
@derailed derailed added kind/backports This PR provides functionality previously merged into master. backport/1.12 labels Nov 3, 2023
@derailed
Copy link
Contributor Author

derailed commented Nov 3, 2023

/test-backport-1.12

@derailed derailed force-pushed the pr/v1.12-backport-2023-11-03 branch from cced45a to 1b0dbe1 Compare November 6, 2023 16:53
@christarazi
Copy link
Member

/test-backport-1.12

1 similar comment
@derailed
Copy link
Contributor Author

derailed commented Nov 7, 2023

/test-backport-1.12

…er reported.

[ upstream commit e9f97cd ]

When a node is deleted from a cluster, metrics associated with that node
are still being exported to prometheus. Short of restarting the agent,
we want to dynamically delete these metrics when a node is removed from the cluster.

This PR ensures node_connectivity_status and node_connectivity_latency
no longer report metrics for nodes that are no longer present on the
cluster.

[ Backporter's notes: Original PR was adapted! ]

The original PR depends (mainly!) on 2 other PRs that haven't been
backported and are fairly substential.
Given this, I've opted to adapt the original implementation to surface the
fix while minimizing impact with these updates:
1. pkg/metrics/interfaces did not introduce pkg/metrics/metric wrappers
  as of this release. Hence adapted deletableVec to use the current
implementation. (Referring to commit: 84ea383)
2. pkg/node/manager/manager was adapted to provide for metrics deletion when a
   node is deleted. Subsequent PR refactored the manager metrics structure which
   the original PR used. (Referring to commit: c49ef45)
3. In order to pickup prom metrics vec delete feature
   github.com/prometheus/client_golang dep was bumped to v1.14.0

Signed-off-by: Fernand Galiana <fernand.galiana@isovalent.com>
@derailed derailed force-pushed the pr/v1.12-backport-2023-11-03 branch from 1b0dbe1 to 5d05417 Compare November 7, 2023 16:38
@derailed
Copy link
Contributor Author

derailed commented Nov 7, 2023

/test-backport-1.12

Edit: Both 4.9 jobs failed with flakes similar to

Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed:

Click to show.

Test Name

K8sDatapathConfig MonitorAggregation Checks that monitor aggregation flags send notifications

Failure Output

FAIL: Timed out after 240.000s.

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.16-kernel-4.9/225/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.9 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@christarazi
Copy link
Member

christarazi commented Nov 7, 2023

The runtime tests all failed which might either suggest a VM provisioning failure or something in the privileged unit test suite is broken from changes in this PR.

Edit: Looks like it was a VM provisioning failure.

@christarazi
Copy link
Member

christarazi commented Nov 8, 2023

/test-1.16-4.9

Edit: #24840

@christarazi
Copy link
Member

/test-1.19-4.9

@christarazi
Copy link
Member

/test-runtime

@christarazi
Copy link
Member

/test-1.16-4.9

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 8, 2023
@christarazi christarazi merged commit 36b9f9d into cilium:v1.12 Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants