Skip to content

CI: Cilium E2E Upgrade - timeout reached waiting for deployment cilium-test-1/test-conn-disrupt-client to become ready ("cilium_snat_v4_alloc_retries: file exists") #37325

@ysksuzuki

Description

@ysksuzuki

CI failure

https://github.com/cilium/cilium/actions/runs/13024628663/job/36335383239

+ docker run --network host -v /home/runner/.kube/config:/root/.kube/config -v /home/runner/work/cilium/cilium:/root/app -v /home/runner/.aws:/root/.aws -v /home/runner/.azure:/root/.azure -v /home/runner/.config/gcloud:/root/.config/gcloud quay.io/cilium/cilium-cli-ci:620378f50045e0982d003a7c082948e5cb39f5e4 cilium connectivity test --include-conn-disrupt-test --include-conn-disrupt-test-ns-traffic --test no-interrupted-connections --conn-disrupt-test-restarts-path ./cilium-conn-disrupt-restarts --include-unsafe-tests --collect-sysdump-on-failure --flush-ct --sysdump-hubble-flows-count=1000000 --sysdump-hubble-flows-timeout=5m --sysdump-output-filename 'cilium-sysdump-23-<ts>' --junit-file 'cilium-junits/Setup & Test (23).xml' --junit-property 'github_job_step=Run tests upgrade 2 (23)' --secondary-network-iface=eth1
⌛ [kind-kind] Waiting for deployment cilium-test-1/client to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/client2 to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/echo-same-node to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/client3 to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/test-conn-disrupt-server to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/test-conn-disrupt-server-ns-traffic to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/echo-external-node to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/test-conn-disrupt-client to become ready...
timeout reached waiting for deployment cilium-test-1/test-conn-disrupt-client to become ready (last error: only 4 of 5 replicas are available)

cilium-sysdumps.zip

The agent log shows the error below just before the pod becomes not ready.

logs-cilium-5zvwr-cilium-agent-20250129-071018.log

2025-01-29T07:02:53.507583226Z time="2025-01-29T07:02:53.505271039Z" level=error msg="endpoint regeneration failed" ciliumEndpointName=cilium-test-1/test-conn-disrupt-client-74c554bc56-6dqbr containerID=8cf12ac3fd containerInterface=eth0 datapathPolicyRevision=0 desiredPolicyRevision=5 endpointID=517 error="loading eBPF collection into the kernel: map cilium_snat_v4_alloc_retries: pin map to /sys/fs/bpf/tc/globals/cilium_snat_v4_alloc_retries: file exists" identity=1541 ipv4=10.244.0.44 ipv6="fd00:10:244::af9a" k8sPodName=cilium-test-1/test-conn-disrupt-client-74c554bc56-6dqbr subsys=endpoint

k8s-pods-20250129-071018.yaml

  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2025-01-29T06:58:32Z"
      status: "True"
      type: PodReadyToStartContainers
    - lastProbeTime: null
      lastTransitionTime: "2025-01-29T06:58:25Z"
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: "2025-01-29T07:02:55Z"
      message: 'containers with unready status: [test-conn-disrupt-client]'
      reason: ContainersNotReady
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: "2025-01-29T07:02:55Z"
      message: 'containers with unready status: [test-conn-disrupt-client]'
      reason: ContainersNotReady
      status: "False"
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: "2025-01-29T06:58:24Z"
      status: "True"
      type: PodScheduled

k8s-pods-20250129-071018.txt

NAMESPACE            NAME                                                              READY   STATUS    RESTARTS        AGE     IP             NODE                 NOMINATED NODE   READINESS GATES
cilium-test-1        client-645b68dcf7-gmvf8                                           1/1     Running   0               12m     10.244.2.189   kind-worker          <none>           <none>
cilium-test-1        client2-66475877c6-94bp4                                          1/1     Running   0               12m     10.244.2.82    kind-worker          <none>           <none>
cilium-test-1        client3-795488bf5-gx4js                                           1/1     Running   0               12m     10.244.1.146   kind-worker2         <none>           <none>
cilium-test-1        echo-external-node-5c6cd6d5b9-ck7hh                               1/1     Running   0               12m     172.18.0.4     kind-worker3         <none>           <none>
cilium-test-1        echo-other-node-7f546db4f4-k97l2                                  2/2     Running   0               12m     10.244.1.202   kind-worker2         <none>           <none>
cilium-test-1        echo-same-node-6c545975c6-zxfl8                                   2/2     Running   0               12m     10.244.2.57    kind-worker          <none>           <none>
cilium-test-1        host-netns-m2jdt                                                  1/1     Running   0               12m     172.18.0.3     kind-worker          <none>           <none>
cilium-test-1        host-netns-non-cilium-sf6pw                                       1/1     Running   0               12m     172.18.0.4     kind-worker3         <none>           <none>
cilium-test-1        host-netns-psnmg                                                  1/1     Running   0               12m     172.18.0.5     kind-worker2         <none>           <none>
cilium-test-1        host-netns-stf5f                                                  1/1     Running   0               12m     172.18.0.2     kind-control-plane   <none>           <none>
cilium-test-1        test-conn-disrupt-client-74c554bc56-2mtw2                         1/1     Running   0               12m     10.244.2.152   kind-worker          <none>           <none>
cilium-test-1        test-conn-disrupt-client-74c554bc56-6dqbr                         0/1     Running   2 (2m25s ago)   12m     10.244.0.44    kind-control-plane   <none>           <none>
cilium-test-1        test-conn-disrupt-client-74c554bc56-9dz9r                         1/1     Running   0               12m     10.244.1.100   kind-worker2         <none>           <none>
cilium-test-1        test-conn-disrupt-client-74c554bc56-r6khk                         1/1     Running   0               12m     10.244.2.248   kind-worker          <none>           <none>
cilium-test-1        test-conn-disrupt-client-74c554bc56-s9tqx                         1/1     Running   0               12m     10.244.1.42    kind-worker2         <none>           <none>
cilium-test-1        test-conn-disrupt-client-backend-node-ipv4-internalip-f849wql7r   1/1     Running   0               12m     172.18.0.4     kind-worker3         <none>           <none>
cilium-test-1        test-conn-disrupt-client-backend-node-ipv6-internalip-7f5dgccn7   1/1     Running   0               12m     172.18.0.4     kind-worker3         <none>           <none>
cilium-test-1        test-conn-disrupt-client-non-backend-node-ipv4-internalip-msgmf   1/1     Running   0               12m     172.18.0.4     kind-worker3         <none>           <none>
cilium-test-1        test-conn-disrupt-client-non-backend-node-ipv6-internalip-5cpdx   1/1     Running   0               12m     172.18.0.4     kind-worker3         <none>           <none>
cilium-test-1        test-conn-disrupt-server-744dbd5ccc-2p695                         1/1     Running   0               12m     10.244.2.212   kind-worker          <none>           <none>
cilium-test-1        test-conn-disrupt-server-744dbd5ccc-dl5gq                         1/1     Running   0               12m     10.244.1.174   kind-worker2         <none>           <none>
cilium-test-1        test-conn-disrupt-server-744dbd5ccc-p7nhv                         1/1     Running   0               12m     10.244.0.137   kind-control-plane   <none>           <none>
cilium-test-1        test-conn-disrupt-server-ns-traffic-6b974bbb89-ll8ll              1/1     Running   0               12m     10.244.2.238   kind-worker          <none>           <none>
kube-system          cilium-2hgmd                                                      1/1     Running   0               10m     172.18.0.3     kind-worker          <none>           <none>
kube-system          cilium-4wfjp                                                      1/1     Running   0               10m     172.18.0.5     kind-worker2         <none>           <none>
kube-system          cilium-5zvwr                                                      1/1     Running   0               9m20s   172.18.0.2     kind-control-plane   <none>           <none>
kube-system          cilium-envoy-9klph                                                1/1     Running   0               10m     172.18.0.3     kind-worker          <none>           <none>
kube-system          cilium-envoy-bq9f5                                                1/1     Running   0               9m19s   172.18.0.2     kind-control-plane   <none>           <none>
kube-system          cilium-envoy-gbx5s                                                1/1     Running   0               10m     172.18.0.5     kind-worker2         <none>           <none>
kube-system          cilium-operator-7b6ddb7db6-s5nnq                                  1/1     Running   0               10m     172.18.0.5     kind-worker2         <none>           <none>
kube-system          coredns-668d6bf9bc-8jxp8                                          1/1     Running   0               14m     10.244.1.209   kind-worker2         <none>           <none>
kube-system          coredns-668d6bf9bc-gz8wt                                          1/1     Running   0               14m     10.244.1.185   kind-worker2         <none>           <none>
kube-system          etcd-kind-control-plane                                           1/1     Running   0               15m     172.18.0.2     kind-control-plane   <none>           <none>
kube-system          kube-apiserver-kind-control-plane                                 1/1     Running   0               14m     172.18.0.2     kind-control-plane   <none>           <none>
kube-system          kube-controller-manager-kind-control-plane                        1/1     Running   0               14m     172.18.0.2     kind-control-plane   <none>           <none>
kube-system          kube-scheduler-kind-control-plane                                 1/1     Running   0               14m     172.18.0.2     kind-control-plane   <none>           <none>
local-path-storage   local-path-provisioner-59f7658548-2n5hl                           1/1     Running   0               14m     10.244.1.129   kind-worker2         <none>           <none>

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/CIContinuous Integration testing issue or flakeci/flakeThis is a known failure that occurs in the tree. Please investigate me!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions