Skip to content

Deleted server node continues to attempt to run controlplane components (namely, kube-apiserver) which causes problem with nodes #4060

@Oats87

Description

@Oats87

Environmental Info:
RKE2 Version: v1.23.16+rke2r1

Node(s) CPU architecture, OS, and Version:

root@ip-172-31-3-163:~# uname -a
Linux ip-172-31-3-163 5.4.0-1035-aws #37-Ubuntu SMP Wed Jan 6 21:01:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
root@ip-172-31-3-163:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Cluster Configuration:
3 server nodes (controlplane + etcd), 2 worker nodes
Custom v2prov cluster running with Rancher v2.7-aa7bea9baa40eb8eb1d7166cf37e0b15aaa81ed5-head

Describe the bug:
On deletion of a server node through Rancher v2prov, the two worker nodes in the cluster went into a perpetual NotReady state. Investigating the node, it appears that the kubelet is getting connection timeouts because it is attempting to connect to the apiserver that was once a pool member of the local load balancer.

Steps To Reproduce:

  • Installed RKE2: v1.23.16+rke2r1

Expected behavior:
When I delete a server node, I expect my cluster to recover (even if my kubelet flaps because it was connected to the api server that is being deleted)

Actual behavior:
the apiserver continues to run the corresponding kubernetes components and my kubelets that happened to be connected to it (through the local agent load balancer) go into a perpetual not ready state with timeouts galore

Additional context / logs:

ip-172-31-3-163-2023-03-28_19_15_31-deleted-controlplaneetcd.tar.gz
ip-172-31-10-228-2023-03-28_19_00_44-hung-worker.tar.gz

On the worker node that went to NotReady with the timeouts, I can see the established TCP connections to the no-longer-valid server node through ss, i.e.

root@ip-172-31-10-228:~# ss | grep 6443
tcp                ESTAB               0                    0                                                                                         127.0.0.1:6443                                  127.0.0.1:34246
tcp                ESTAB               0                    0                                                                                     172.31.10.228:58062                              172.31.3.163:6443
tcp                ESTAB               0                    0                                                                                         127.0.0.1:6443                                  127.0.0.1:34208
tcp                ESTAB               0                    0                                                                                         127.0.0.1:34208                                 127.0.0.1:6443
tcp                ESTAB               0                    0                                                                                         127.0.0.1:6443                                  127.0.0.1:34352
tcp                ESTAB               0                    0                                                                                         127.0.0.1:34352                                 127.0.0.1:6443
tcp                ESTAB               0                    0                                                                                         127.0.0.1:34246                                 127.0.0.1:6443
tcp                ESTAB               0                    0                                                                                     172.31.10.228:58168                              172.31.3.163:6443
tcp                ESTAB               0                    0                                                                                     172.31.10.228:52252                             172.31.14.174:6443

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions