-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Found as part of working on #32599
Rancher Server Setup
- Rancher version:
master-head
- Installation option (Docker install/Helm Chart):
rancher/rancher
code running locally against k3s cluster
Information about the Cluster
- Kubernetes version: v1.19.8+k3s1 (upstream), v1.21.3+rke2r1 (downstream)
Describe the bug
Following restore of backed up cluster with RKE2-provisioned downstream cluster, fleetagent
running on downstream cluster fails to communicate back to Rancher server as evident by 401s responses to requests it sends to Rancher API as shown below:
To Reproduce
- Provision DigitalOcean RKE2 downstream cluster. The issue may not require RKE2-provisioned cluster, but was found with it.
- Using "Rancher Backup" app perform a backup of the cluster. Note that restore of RKE2-provision clusters is being worked on as part of RKE2 Provisioning: Backup - Support Rancher backups with RKE2 provisioned clusters #32599 and so far the following needs to be added to Backup/Restore resourceset to backup related objects:
# Added for v2 provisioning
- apiVersion: apiextensions.k8s.io/v1
kindsRegexp: .
resourceNameRegexp: provisioning.cattle.io$|rke-machine-config.cattle.io$|rke-machine.cattle.io$|rke.cattle.io$
- apiVersion: provisioning.cattle.io/v1
kindsRegexp: .
- apiVersion: rke-machine-config.cattle.io/v1
kindsRegexp: .
- apiVersion: rke-machine.cattle.io/v1
kindsRegexp: .
- apiVersion: rke.cattle.io/v1
kindsRegexp: .
- apiVersion: apiextensions.k8s.io/v1
kindsRegexp: .
resourceNameRegexp: cluster.x-k8s.io$
- apiVersion: cluster.x-k8s.io/v1alpha4
kindsRegexp: .
# The below will backup unnecessary default-token-... secret
- apiVersion: v1
kindsRegexp: ^secrets$
namespaces:
- fleet-default
- Follow the instructions to restore Rancher to a new cluster per https://rancher.com/docs/rancher/v2.x/en/backups/v2.5/migrating-rancher/ (ensure source upstream cluster and Rancher is stopped).
Result
After starting Rancher against the restored cluster there are 401-code responses to requests sent by fleetagent
as per the screenshot above. This indicates fleetagent
is unable to communicate to the upstream Rancher server.
Expected Result
fleetagent
communicates back to upstream Rancher without issues.
Additional context
The issue is almost certainly caused by fleetagent
on downstream cluster persisting token from secret associated to a service account on upsteam cluster. Following restore, this token is no longer valid. Unfortunately, there doesn't seem to be a way to migrate service accounts to a new cluster in a way that token generated on the source cluster to remain valid (https://stackoverflow.com/q/65580643/16564280).
One option to address the issue would be to somehow trigger fleetagent
to re-acquire token from cluster on restore.