Skip to content

Cilium unable to remove taints on Azure AKS #19788

@akostic-kostile

Description

@akostic-kostile

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

It seems that Microsoft in their infinite wisdom made a breaking change in 24.04.22. release of AKS regarding taints removal.

https://github.com/Azure/AKS/releases/tag/2022-04-24

Taints and labels applied using the AKS nodepool API are not modifiable from the Kubernetes API and vice versa. Also, any modifications to system taints will not be allowed.

Our clusters were deployed using Terraform, which is how we're setting this taint on the nodes (in other words by using nodepool API and not K8S API).

I found this out 30 minutes ago when, during production deployment, pods were stuck in pending state.

I'm not yet sure how to handle this situation, a workaround that comes to mind might be to provision a new node pool with no taint applied, to manually apply taint using kubectl as Cilium should be able to remove this taint.

In any case I think this is something you should be aware of and possibly update documentation accordingly.

Cilium Version

❯ cilium version cilium-cli: 0.11.1 compiled with go1.18.1 on darwin/arm64 cilium image (default): v1.11.3 cilium image (stable): v1.11.4 cilium image (running): v1.11.3

Kernel Version

5.4.0-1077-azure

Kubernetes Version

❯ k version Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:41:58Z", GoVersion:"go1.18.1", Compiler:"gc", Platform:"darwin/arm64"} Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.6", GitCommit:"07959215dd83b4ae6317b33c824f845abd578642", GitTreeState:"clean", BuildDate:"2022-03-30T18:28:25Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

No response

Relevant log output

`Errors:           cilium             cilium-sxrrr    controller mark-k8s-node-as-available is failing since 6s (204x): admission webhook "aks-node-validating-webhook.azmk8s.io" denied the request: (UID: 7ffad4f3-9897-4d9a-a87d-d5af33c87e81) Taint delete request "node.cilium.io/agent-not-ready=true:NoSchedule" refused. User is attempting to delete a taint configured on aks node pool "d16asv5".`

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/azureImpacts Azure based IPAM.integration/cloudRelated to integration with cloud environments such as AKS, EKS, GKE, etc.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions