[BUG] rancher machines are not removed from the cluster after actual worker nodes removed

**Rancher Server Setup**
- Rancher version: 2.7.9, 2.8.0
- Installation option (Docker install/Helm Chart): helm
   - If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc):
- Proxy/Cert Details: none

**Information about the Cluster**
- Kubernetes version: 1.27.8
- Cluster Type (Local/Downstream): downstream custom rke2 on hosted on aws
   - If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider):


**User Information**
- What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
  - If custom, define the set of permissions:



**Describe the bug**
After upgrade from 2.7.5 to 2.7.9 I've noticed that terminated worker nodes are still displayed on rancher UI with `Nodenotfound` status.

`capi-controller-manager`: is full of errors like:

```
E1204 17:41:44.435523       1 controller.go:329] "Reconciler error" err="no matching Node for Machine \"custom-fbdc7789f02e\" in namespace \"fleet-default\": cannot find node with matching ProviderID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-default/custom-fbdc7789f02e" namespace="fleet-default" name="custom-fbdc7789f02e" reconcileID=330124c5-1ffb-4f7d-a618-94a976c62106
E1204 17:41:44.585423       1 controller.go:329] "Reconciler error" err="no matching Node for Machine \"custom-be9d831e6358\" in namespace \"fleet-default\": cannot find node with matching ProviderID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-default/custom-be9d831e6358" namespace="fleet-default" name="custom-be9d831e6358" reconcileID=63186033-4f57-4524-85af-2123a88f3a04
```

machine resource mentioned in the error still exists:
`k -n fleet-default  get machine | grep custom-be9d831e6358`

```
custom-be9d831e6358   rancher-euc1-te-test01   ip-172-29-196-238.eu-central-1.compute.internal   aws:///eu-central-1a/i-0a82b430f14175488   Running       5h6m
```

but `nodes.management.cattle.io` resources for old workers are gone

Also in downstream rke2 cluster `k get nodes` doesn't show these old nodes, so it is only affecting rancher.

**To Reproduce**
- Terminate active running  node

**Result**
After node is terminated it is removed from the cluster but it is not removed from rancher and node still displayed on UI in `Nodenotfound` status.

**Expected Result**
After node is terminated it is removed from the cluster and it is also removed from rancher.

**Screenshots**
<img width="1456" alt="image" src="https://github.com/rancher/rancher/assets/53786845/d054b7e1-1c38-4197-b236-0d56b6ddcd88">


**Additional context**
In slack other ppl mentioned that they have same issue starting from version 2.7.6 https://rancher-users.slack.com/archives/C3ASABBD1/p1701711546706849\

SURE-8277

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] rancher machines are not removed from the cluster after actual worker nodes removed #43686

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] rancher machines are not removed from the cluster after actual worker nodes removed #43686

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions