Skip to content

[BUG] [CAPR] Upgrading Rancher during etcd restoration from < v2.7.5 to v2.7.5+ can lead to implausible joined server for entry  #42856

@Oats87

Description

@Oats87

Rancher Server Setup

  • Rancher version: v2.7.6

Information about the Cluster
Downstream CAPR K3s or RKE2

Describe the bug
If a Rancher installation is upgraded from before v2.7.5 to v2.7.5 or newer during an etcd restoration operation, it is possible to have the planner start stalling out on implausible joined server for entry.

To Reproduce
Run Rancher v2.7.3, and provision a cluster, break it, and attempt to restore it in a way that is not compatible with v2.7.3. Upgrade to v2.7.6 and attempt to restore the cluster -- observe that the planner will start to error out during etcd restoration shutdown plan generation as the joined-to annotation has not been set.

Result
Planner does not restore the cluster.

Expected Result
Planner restores the cluster.

Screenshots

Additional context

The temporary workaround for this issue is to annotate the machine plan secrets within the local cluster with the corresponding joined-to annotation that corresponds to the join URL of the new init node (the node you are going to restore the etcd snapshot to)

For RKE2 with an init node at 172.16.1.5 and a cluster called my-rke2-cluster, this would look like:

kubectl annotate secret -n fleet-default -l cluster.x-k8s.io/cluster-name=my-rke2-cluster,rke.cattle.io/init-node=true rke.cattle.io/join-url=https://172.16.1.5:9345
kubectl annotate secret -n fleet-default -l cluster.x-k8s.io/cluster-name=my-rke2-cluster,rke.cattle.io/init-node!=true rke.cattle.io/joined-to=https://172.16.1.5:9345

For a K3s cluster with an init node at 172.16.1.5 and a cluster called my-k3s-cluster, this would look like:

kubectl annotate secret -n fleet-default -l cluster.x-k8s.io/cluster-name=my-rke2-cluster,rke.cattle.io/init-node=true rke.cattle.io/join-url=https://172.16.1.5:6443
kubectl annotate secret -n fleet-default -l cluster.x-k8s.io/cluster-name=my-rke2-cluster,rke.cattle.io/init-node!=true rke.cattle.io/joined-to=https://172.16.1.5:6443

Note the port difference between K3s and RKE2.

Metadata

Metadata

Labels

QA/Marea/caprProvisioning issues that involve cluster-api-provider-rancherkind/bugIssues that are defects reported by users or that we know have reached a real releasepriority/1release-noteNote this issue in the milestone's release notesstatus/release-note-addedteam/hostbustersThe team that is responsible for provisioning/managing downstream clusters + K8s version support

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions