fleetagent running on downstream cluster cannot communicate back to Rancher server after restore

Found as part of working on https://github.com/rancher/rancher/issues/32599

**Rancher Server Setup**
- Rancher version: `master-head`
- Installation option (Docker install/Helm Chart): `rancher/rancher` code running locally against k3s cluster

**Information about the Cluster**
- Kubernetes version: v1.19.8+k3s1 (upstream), v1.21.3+rke2r1 (downstream)

**Describe the bug**
Following restore of backed up cluster with RKE2-provisioned downstream cluster, `fleetagent` running on downstream cluster fails to communicate back to Rancher server as evident by 401s responses to requests it sends to Rancher API as shown below:
![image](https://user-images.githubusercontent.com/85187633/128262635-e4f9de95-12ad-40ca-bfda-d397e677031f.png)

**To Reproduce**

1. Provision DigitalOcean RKE2 downstream cluster. The issue may not require RKE2-provisioned cluster, but was found with it.
2. Using "Rancher Backup" app perform a backup of the cluster. Note that restore of RKE2-provision clusters is being worked on as part of https://github.com/rancher/rancher/issues/32599 and so far the following needs to be added to Backup/Restore resourceset to backup related objects:
```yaml
   # Added for v2 provisioning
  - apiVersion: apiextensions.k8s.io/v1
    kindsRegexp: .
    resourceNameRegexp: provisioning.cattle.io$|rke-machine-config.cattle.io$|rke-machine.cattle.io$|rke.cattle.io$
  - apiVersion: provisioning.cattle.io/v1
    kindsRegexp: .
  - apiVersion: rke-machine-config.cattle.io/v1
    kindsRegexp: .
  - apiVersion: rke-machine.cattle.io/v1
    kindsRegexp: .
  - apiVersion: rke.cattle.io/v1
    kindsRegexp: .
  - apiVersion: apiextensions.k8s.io/v1
    kindsRegexp: .
    resourceNameRegexp: cluster.x-k8s.io$
  - apiVersion: cluster.x-k8s.io/v1alpha4
    kindsRegexp: .
  # The below will backup unnecessary default-token-... secret
  - apiVersion: v1
    kindsRegexp: ^secrets$
    namespaces:
    - fleet-default
```
3. Follow the instructions to restore Rancher to a new cluster per https://rancher.com/docs/rancher/v2.x/en/backups/v2.5/migrating-rancher/ (ensure source upstream cluster and Rancher is stopped).

**Result**
After starting Rancher against the restored cluster there are 401-code responses to requests sent by `fleetagent` as per the screenshot above. This indicates `fleetagent` is unable to communicate to the upstream Rancher server.

**Expected Result**
`fleetagent` communicates back to upstream Rancher without issues.

**Additional context**
The issue is almost certainly caused by `fleetagent` on downstream cluster persisting token from secret associated to a service account on upsteam cluster. Following restore, this token is no longer valid. Unfortunately, there doesn't seem to be a way to migrate service accounts to a new cluster in a way that token generated on the source cluster to remain valid (https://stackoverflow.com/q/65580643/16564280).
One option to address the issue would be to somehow trigger `fleetagent` to re-acquire token from cluster on restore.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fleetagent running on downstream cluster cannot communicate back to Rancher server after restore #33954

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fleetagent running on downstream cluster cannot communicate back to Rancher server after restore #33954

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions