-
Notifications
You must be signed in to change notification settings - Fork 303
Description
Environmental Info:
RKE2 Version: v1.23.16+rke2r1
Node(s) CPU architecture, OS, and Version:
root@ip-172-31-3-163:~# uname -a
Linux ip-172-31-3-163 5.4.0-1035-aws #37-Ubuntu SMP Wed Jan 6 21:01:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
root@ip-172-31-3-163:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
Cluster Configuration:
3 server nodes (controlplane + etcd), 2 worker nodes
Custom v2prov cluster running with Rancher v2.7-aa7bea9baa40eb8eb1d7166cf37e0b15aaa81ed5-head
Describe the bug:
On deletion of a server node through Rancher v2prov, the two worker nodes in the cluster went into a perpetual NotReady
state. Investigating the node, it appears that the kubelet is getting connection timeouts because it is attempting to connect to the apiserver
that was once a pool member of the local load balancer.
Steps To Reproduce:
- Installed RKE2:
v1.23.16+rke2r1
Expected behavior:
When I delete a server node, I expect my cluster to recover (even if my kubelet flaps because it was connected to the api server that is being deleted)
Actual behavior:
the apiserver continues to run the corresponding kubernetes components and my kubelets that happened to be connected to it (through the local agent load balancer) go into a perpetual not ready state with timeouts galore
Additional context / logs:
ip-172-31-3-163-2023-03-28_19_15_31-deleted-controlplaneetcd.tar.gz
ip-172-31-10-228-2023-03-28_19_00_44-hung-worker.tar.gz
On the worker node that went to NotReady
with the timeouts, I can see the established TCP connections to the no-longer-valid server node through ss
, i.e.
root@ip-172-31-10-228:~# ss | grep 6443
tcp ESTAB 0 0 127.0.0.1:6443 127.0.0.1:34246
tcp ESTAB 0 0 172.31.10.228:58062 172.31.3.163:6443
tcp ESTAB 0 0 127.0.0.1:6443 127.0.0.1:34208
tcp ESTAB 0 0 127.0.0.1:34208 127.0.0.1:6443
tcp ESTAB 0 0 127.0.0.1:6443 127.0.0.1:34352
tcp ESTAB 0 0 127.0.0.1:34352 127.0.0.1:6443
tcp ESTAB 0 0 127.0.0.1:34246 127.0.0.1:6443
tcp ESTAB 0 0 172.31.10.228:58168 172.31.3.163:6443
tcp ESTAB 0 0 172.31.10.228:52252 172.31.14.174:6443