Skip to content

[BUG] rke2 clusters with invalid values for tolerations / affinity agent customization do not show error to user, stay in updating state on cluster create #41606

@slickwarren

Description

@slickwarren

Rancher Server Setup

  • Rancher version: v2.7-head ()
  • Installation option (Docker install/Helm Chart): helm
    • If Helm Chart, Kubernetes Cluster and version (RKE1, RKE2, k3s, EKS, etc): rke1, 1.4.5
  • Proxy/Cert Details: valid certs

Information about the Cluster

  • Kubernetes version: tested with 1.25.9, 1.26.x
  • Cluster Type (Local/Downstream): downstream
    • If downstream, what type of cluster? (Custom/Imported or specify provider for Hosted/Infrastructure Provider): rke2 linode driver

User Information

  • What is the role of the user logged in? (Admin/Cluster Owner/Cluster Member/Project Owner/Project Member/Custom)
    • If custom, define the set of permissions: standard user, cluster owner

Describe the bug

when provisioning a cluster with invalid values for tolerations or affinity agent customizations (cluster agent or fleet agent), the cluster is able to start being created. However, the cluster never shows any error state and hangs in an updating state

To Reproduce

  • deploy an rke2 cluster
    • for cluster agent and fleet agent customization, enter a bad label for tolerations and node affinity, i.e. badLabel 123"[];'{}-+=
  • wait for the cluster to provision

Result
cluster is able to be created, but stays in an updating state, see screenshots

Expected Result

cluster should error out and bubble this state up to the user on the management page
error should be shown to the user

if user ssh's into the node and views rke2-server logs, you can see there is an error that spams there every few seconds:


May 18 22:37:07 bad-custization1-dsa-3a9fb14b-s4rsd rke2[2054]: time="2023-05-18T22:37:07Z" level=error msg="Failed to process config: failed to process /var/lib/rancher/rke2/server/manifests/rancher/cluster-agent.yaml: failed to create cattle-system/cattle-cluster-agent apps/v1, Kind=Deployment for  kube-system/cluster-agent: Deployment.apps \"cattle-cluster-agent\" is invalid: [spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[1].key: Invalid value: \"notGood~ [];'/..,\": prefix part a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[1].key: Invalid value: \"notGood~ [];'/..,\": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'), spec.template.spec.tolerations[3].key: Invalid value: \"notGood~ [];'/..,\": prefix part a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*'), spec.template.spec.tolerations[3].key: Invalid value: \"notGood~ [];'/..,\": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')]"

Screenshots

Screen Shot 2023-05-18 at 3 30 18 PM
Screen Shot 2023-05-18 at 3 30 12 PM
Screen Shot 2023-05-18 at 1 42 00 PM

Additional context

error is bubbled up to the user for rke1

Metadata

Metadata

Labels

QA/XSarea/capr/rke2RKE2 Provisioning issues involving CAPRarea/provisioning-v2Provisioning issues that are specific to the provisioningv2 generating frameworkkind/bug-qaIssues that have not yet hit a real release. Bugs introduced by a new feature or enhancementpriority/2team/hostbustersThe team that is responsible for provisioning/managing downstream clusters + K8s version support

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions