kops v1.26 upgrade fails due to serviceAccountIssuer changes

/kind bug

**1. What `kops` version are you running? The command `kops version`, will display
 this information.**
Client version: 1.26.6 (git-v1.26.6)
or
Client version: 1.25.4 (git-v1.25.4)

**2. What Kubernetes version are you running? `kubectl version` will print the
 version if a cluster is running or provide the Kubernetes version specified as
 a `kops` flag.**
Server Version: v1.25.16

**3. What cloud provider are you using?**
AWS

**4. What commands did you run?  What is the simplest way to reproduce this issue?**
Upgrading from kops v1.25.4 to v1.26.6: 
1. Use a _custom_ `masterInternalName` in the kops cluster spec.
2. `kops update cluster`.
3. roll the first control-plane instance with `kops rolling-update cluster`.

**5. What happened after the commands executed?**
The cluster fails to validate since `calico-node` fails to start on the new control-plane instance.

The `install-cni` init container in the `calico-node` pod fails with an error message:
```
2024-04-22 23:06:36.650 [INFO][1] cni-installer/<nil> <nil>: CNI plugin version: v3.24.5

2024-04-22 23:06:36.650 [INFO][1] cni-installer/<nil> <nil>: /host/secondary-bin-dir is not writeable, skipping
W0422 23:06:36.651145       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2024-04-22 23:06:36.664 [ERROR][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Unauthorized
2024-04-22 23:06:36.664 [FATAL][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Unauthorized
```

Other pods on the new control-plane node that depend on the k8s-api also have similar unauthorized logs.

**6. What did you expect to happen?**
kops to be upgraded from v1.25.4 to v1.26.6 successfully.

**7. Please provide your cluster manifest. Execute
  `kops get --name my.example.com -o yaml` to display your cluster manifest.
  You may want to remove your cluster name and other sensitive information.**

```yaml
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: cluster.domain
spec:
  api:
    loadBalancer:
      class: Classic
      idleTimeoutSeconds: 3600
      type: Internal
  authorization:
    rbac: {}
...
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    authorizationMode: Node,RBAC
    featureGates:
      HPAScaleToZero: "true"
  kubeControllerManager:
    horizontalPodAutoscalerDownscaleStabilization: 15m0s
  kubeDNS:
    nodeLocalDNS:
      enabled: true
    provider: CoreDNS
  kubeProxy:
    metricsBindAddress: 0.0.0.0
  kubeScheduler: {}
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesVersion: 1.25.16
  masterInternalName: master.cluster.domain
  masterPublicName: api.cluster.domain
...
```

The cluster spec update diff from `kops update cluster` shows the `masterInternalName` being removed and changes to the `serviceAccountIssuer` & `serviceAccountJWKSURI` fields:
```diff
ManagedFile/cluster-completed.spec
Contents            
...
	- X-Remote-User
	securePort: 443
+     serviceAccountIssuer: https://api.internal.cluster.domain
-     serviceAccountIssuer: https://master.cluster.domain
+     serviceAccountJWKSURI: https://api.internal.cluster.domain/openid/v1/jwks
-     serviceAccountJWKSURI: https://master.cluster.domain/openid/v1/jwks
	serviceClusterIPRange: 100.64.0.0/13
	storageBackend: etcd3
...
kubernetesVersion: 1.25.16
-   masterInternalName: master.cluster.domain
masterKubelet:
	anonymousAuth: false
...
```

**8. Please run the commands with most verbose logging by adding the `-v 10` flag.
  Paste the logs into this report, or in a gist and provide the gist link here.**

**9. Anything else do we need to know?**
It seems that in kops v1.26 the ability to specify `masterInternalName` was removed (#14507) which is why we are noticing this problem during upgrade.

However I can reproduce the same issue as above by simply removing `masterInternalName` from our cluster spec, apply it to the cluster and roll the first control-plane node (all using kops v1.25.4). This update also produces the same update diff as posted above.

To move forward it seems like changing the `serviceAccountIssuer` & `serviceAccountJWKSURI` fields is inevitable but it is not clear what steps to take in order to roll this change out. Is anyone able to advise?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kops v1.26 upgrade fails due to serviceAccountIssuer changes #16488

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kops v1.26 upgrade fails due to serviceAccountIssuer changes #16488

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions