-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Rancher Server Setup
- Rancher version: 2.7.9, 2.8.0
- Installation option (Docker install/Helm Chart): helm
- If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc):
- Proxy/Cert Details: none
Information about the Cluster
- Kubernetes version: 1.27.8
- Cluster Type (Local/Downstream): downstream custom rke2 on hosted on aws
- If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):
User Information
- What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
- If custom, define the set of permissions:
Describe the bug
After upgrade from 2.7.5 to 2.7.9 I've noticed that terminated worker nodes are still displayed on rancher UI with Nodenotfound
status.
capi-controller-manager
: is full of errors like:
E1204 17:41:44.435523 1 controller.go:329] "Reconciler error" err="no matching Node for Machine \"custom-fbdc7789f02e\" in namespace \"fleet-default\": cannot find node with matching ProviderID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-default/custom-fbdc7789f02e" namespace="fleet-default" name="custom-fbdc7789f02e" reconcileID=330124c5-1ffb-4f7d-a618-94a976c62106
E1204 17:41:44.585423 1 controller.go:329] "Reconciler error" err="no matching Node for Machine \"custom-be9d831e6358\" in namespace \"fleet-default\": cannot find node with matching ProviderID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-default/custom-be9d831e6358" namespace="fleet-default" name="custom-be9d831e6358" reconcileID=63186033-4f57-4524-85af-2123a88f3a04
machine resource mentioned in the error still exists:
k -n fleet-default get machine | grep custom-be9d831e6358
custom-be9d831e6358 rancher-euc1-te-test01 ip-172-29-196-238.eu-central-1.compute.internal aws:///eu-central-1a/i-0a82b430f14175488 Running 5h6m
but nodes.management.cattle.io
resources for old workers are gone
Also in downstream rke2 cluster k get nodes
doesn't show these old nodes, so it is only affecting rancher.
To Reproduce
- Terminate active running node
Result
After node is terminated it is removed from the cluster but it is not removed from rancher and node still displayed on UI in Nodenotfound
status.
Expected Result
After node is terminated it is removed from the cluster and it is also removed from rancher.
Additional context
In slack other ppl mentioned that they have same issue starting from version 2.7.6 https://rancher-users.slack.com/archives/C3ASABBD1/p1701711546706849\
SURE-8277