Skip to content

[BUG] rancher-webhook blocking CAPR operations/cluster operations in downstream cluster #41613

@Oats87

Description

@Oats87

Rancher Server Setup

  • Rancher version: v2.7-head

Information about the Cluster

  • Kubernetes version: K3s and/or RKE2
  • Cluster Type (Local/Downstream): Downstream CAPR/v2prov

Describe the bug
The rancher-webhook is being deployed into downstream clusters and creating a MutatingWebhookConfiguration and ValidatingWebhookConfiguration that checks corev1/Secret objects. While ordinarily this wouldn't be a big problem with minimally invasive configurations, the configurations are deployed and have a failure policy that blocks creation of secret objects. K3s/RKE2 use secrets internally for various reasons, such as node-passwd and (in RKE2's case) around static manifest generation for system components.

I observed this in 2 isolated cases but both related to v2prov:

To Reproduce
Scenario A: Create 3 node cluster with 1 controlplane, 1 etcd, and 1 worker node, then replace the worker node and observe your new worker node never registers. This was encountered in this issue: #41133 (comment)

Scenario B: Create a single-node all-in-one downstream RKE2 cluster, and perform a selective kube-controller-manager certificate rotation. This was encountered when manually validating this issue: #41125

Result

Scenario A: Observe that k3s-agent is complaining about an invalid node password file, and look in the k3s unit on the controlplane host to see that it is failing to create a secret due to the webhook policy.

Scenario B: Observe that the kube-controller-manager doesn't come back up on its own, and rke2-server logs:

May 19 15:03:43 kskcm-pool1-2876d649-4xn4n rke2[17126]: time="2023-05-19T15:03:43Z" level=warning msg="Failed to create Kubernetes secret: Internal error occurred: failed calling webhook \"rancher.cattle.io.secrets\": failed to call webhook: Post \"https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation/secrets?timeout=10s\": context deadline exceeded"

Expected Result
Neither scenario fails.

Screenshots

Additional context

Metadata

Metadata

Labels

QA/Skind/bugIssues that are defects reported by users or that we know have reached a real releaserelease-noteNote this issue in the milestone's release notesstatus/release-blockerteam/area1team/hostbustersThe team that is responsible for provisioning/managing downstream clusters + K8s version support

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions