-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
Rancher Server Setup
- Rancher version: 2.9.1
- Installation option (Docker install/Helm Chart):
- Helm chart
- RKE2
- 1.28.12
Information about the Cluster
- Kubernetes version:
- Cluster Type (Local/Downstream):
- downstream RKE2 custom clusters
User Information
- What is the role of the user logged in?: Admin
Describe the bug
After upgrade from 2.8.3 rancher to 2.9.1 managed RKE2 custom clusters provisioned using rancher2_cluster_v2 are broken because /etc/rancher/rke2/config.yaml.d/50-rancher.yaml
configuration provided is completely ignored and was replaced most likely with a default parameters.
To Reproduce
- provision managed rke2 1.28.x cluster with terraform rancher2_cluster_v2 resource on rancher 2.8.3
- Upgrade Rancher to latest 2.9.1
Result
After upgrade on a managed cluster rancher agent pods in cattle-system
namespace getting replaced with new pods version v2.9.1
and then something happens so /etc/rancher/rke2/config.yaml.d/50-rancher.yaml
config is being replaced with just a few lines which makes cluster completely broken because it removes configured cilium and other critical components.
Expected Result
Rancher upgrade to 2.9.1 does not brake managed RKE2 clusters.
Additional context
I have noticed that after upgrade some cluster critical components such as kyverno started failing with this:
{"level":"error","ts":1724780897.1700578,"logger":"klog","caller":"leaderelection/leaderelection.go:332","msg":"error retrieving resource lock kyverno/kyverno: Get \"https://100.64.0.1:443/apis/coordination.k8s.io/v1/namespaces/kyverno/leases/kyverno\": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, ::1, 172.23.103.172, 172.23.103.172, 10.43.0.1, not 100.64.0.1","stacktrace":"k8s.io/client-go/tools/leaderelection.(*LeaderElector).tryAcquireOrRenew\n\tk8s.io/client-go@v0.29.0/tools/leaderelection/leaderelection.go:332\nk8s.io/client-go/tools/leaderelection.(*LeaderElector).acquire.func1\n\tk8s.io/client-go@v0.29.0/tools/leaderelection/leaderelection.go:252\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\tk8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:226\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tk8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:227\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\tk8s.io/apimachinery@v0.29.0/pkg/util/wait/backoff.go:204\nk8s.io/client-go/tools/leaderelection.(*LeaderElector).acquire\n\tk8s.io/client-go@v0.29.0/tools/leaderelection/leaderelection.go:251\nk8s.io/client-go/tools/leaderelection.(*LeaderElector).Run\n\tk8s.io/client-go@v0.29.0/tools/leaderelection/leaderelection.go:208\ngithub.com/kyverno/kyverno/pkg/leaderelection.(*config).Run\n\tgithub.com/kyverno/kyverno/pkg/leaderelection/leaderelection.go:136\nmain.main.func2\n\tgithub.com/kyverno/kyverno/cmd/kyverno/main.go:462"}
Also pods in cattle-system
ns failing same way:
time="2024-08-27T17:48:43Z" level=error msg="error syncing 'cattle-system/apply-system-agent-upgrader-on-ip-172-23-102-146-with-073-56173': handler system-upgrade-controller: Get \"https://100.64.0.1:443/apis/upgrade.cattle.io/v1/namespaces/cattle-system/plans/system-agent-upgrader\": tls: failed to verify certificate: x509: certificate is valid for 127.0.0.1, ::1, 172.23.102.15, 172.23.102.15, 10.43.0.1, not 100.64.0.1, requeuing"