-
Notifications
You must be signed in to change notification settings - Fork 344
Closed
Labels
FeedbackGeneral feedbackGeneral feedbackquestionresolution/answer-providedProvided answer to issue, question or feedback.Provided answer to issue, question or feedback.
Description
Hello 😄
What happened: Creating system nodepools with node taints is not working since 2021-09-28, failing with the following error:
(SystemPoolHasRestrictedTaint) Operation failed due to insufficient nodes for system pod scheduling. Placing custom taints on system pool is not supported(except 'CriticalAddonsOnly' taint or taint effect is 'PreferNoSchedule'). Please refer to https://aka.ms/aks/system-taints for detail
What you expected to happen: System nodepool is created with the node taint applied.
How to reproduce it (as minimally and precisely as possible): Using the CLI:
az group create --name foo --location westeurope
az aks create --resource-group foo --name bar
az aks nodepool add --name nodepool2 --resource-group foo --cluster-name bar \
--mode system --node-taints foo=bar:NoSchedule
Anything else we need to know?:
We recommend users to apply taint node.cilium.io/agent-not-ready=true:NoSchedule
to nodepools when using cilium/cilium (CNI plugin) to prevent application pods from being managed by the default AKS CNI plugin.
- Rationale: if users want to use Cilium, there's a concurrency issue where application pods might be managed by Cilium or by the default CNI plugin. Thanks to the node taint, the application pods will not run until Cilium is deployed and removes the taint. A more detailed explanation can be found in Preventing unmanaged Cilium endpoints on newly-created nodes cilium/cilium#16602.
- Caveat: it is not possible to apply node taints to the initial nodepool of AKS clusters at creation time (see CLI request: set --node-taints for primary node pool via CLI #1402).
- Solution: as of writing, Cilium documentation recommended the users to create a cluster, then add a secondary system nodepool with the taint applied, then delete the initial nodepool.
We test this behaviour in our AKS CI, via an automated GitHub Actions workflow using the Azure CLI (see source command lines here).
- This was working until 2021-09-28 around 20:00 CEST: https://github.com/cilium/cilium/actions/runs/1283753297
- This stopped working with the error mentioned above soon after (here around 21:45 CEST): https://github.com/cilium/cilium/actions/runs/1284063464
Thanks!
Metadata
Metadata
Assignees
Labels
FeedbackGeneral feedbackGeneral feedbackquestionresolution/answer-providedProvided answer to issue, question or feedback.Provided answer to issue, question or feedback.