Skip to content

helm chart nodeinit.restartPods doesn't work on azure aks-ubuntu-1804 images #12850

@UnwashedMeme

Description

@UnwashedMeme

Bug report

General Information

Trying to use the cilium-node-init's restartPods functionality doesn't work on AKS with ubuntu 18.04 images. Connectivity test pods created before the reconfigureKubelet completes fail to become ready (pass). Looking at the logs for cilium-node-init the restart isn't happening.

I believe the issue is that the none of the branches in this if statement match on the azure images:

  • the one I think should match: if grep -q 'docker' /etc/crictl.yaml; then doesn't because the file /etc/crictl.yaml doesn't exist so grep errors.
  • I think we could improve it with if [ ! -f /etc/crictl.yaml ] || grep -q 'docker' /etc/crictl.yaml; then (check for the existence of the file). In my testing this works for me; I can see the "Restarting kubenet managed pods" msg in the cilium-node-init logs.

How to reproduce the issue

  • Cilium version: 1.8.2
  1. Run aks install (filling in bash vars with your own)
  2. Helm install cilium
  3. Immediately apply the connectivity-check.yaml
    • note this might need to be reapplied a couple times to get the cilium network policy
    • creating these pods before the cilium-node-init finishes is important so they get created under a kubenet policy before the reconfigure changes kubenet->cni and restarts the kubelet. There are other ways this bug shows up, but this is the clearest way to demonstrate it.
aksargs=(
   --subscription "$SUB"
   --resource-group "$RG"
   --name "$NAME"
   --kubernetes-version 1.17.7
   --vm-set-type "VirtualMachineScaleSets"
   # Causes it to not create a public IP for the api-server
   --enable-private-cluster
   # don't use Azure CNI; but we will overwrite this later w/ cilium
   --network-plugin kubenet
   --load-balancer-sku "standard"
   --vnet-subnet-id "$SUBNET_ID"
   # Not really used; but needs to be defined
   --docker-bridge-address="172.17.8.1/23"
   # Internal IPs of kubernetes services
   --service-cidr "172.17.0.0/21"
   --dns-service-ip="172.17.0.10"  # They ask for it to be `.10`... sure
   --pod-cidr "172.17.32.0/19"
   --service-principal "$APPID"
   --client-secret "$APPPWD"
   #https://docs.microsoft.com/en-us/azure/aks/cluster-configuration#generation-2-virtual-machines-preview
   # importantly this triggers ubuntu 18.04 images
   --aks-custom-headers "usegen2vm=true"
)
ciliumhelmargs=(
   --version 1.8.2
   --namespace cilium
   --set config.ipam=kubernetes

   # Rewrite kubelet config file to enable CNI w/ the node-init DaemonSet.
   --set global.nodeinit.enabled=true
   --set nodeinit.reconfigureKubelet=true
   --set nodeinit.removeCbrBridge=true
   # Any pods that already running won't get the above changes; have nodeinit restart them
   # this doesn't actually work right now.
   --set nodeinit.restartPods=true

   # Use  cilium native routing
   --set global.tunnel=disabled
   --set global.endpointRoutes.enabled=true
   --set global.nativeRoutingCIDR=172.17.32.0/19
)
az aks create "${aksargs[@]}"
az aks get-credentials --resource-group $RG --name $NAME
kubectl create ns cilium
helm install cilium cilium/cilium "${ciliumargs[@]}"
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.8.2/examples/kubernetes/connectivity-check/connectivity-check.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions