Skip to content

CI: Multicluster / Cluster mesh: pods "cilium-xxxxx" not found #23371

@giorio94

Description

@giorio94

CI failure

Run cilium connectivity test --flow-validation=disabled --hubble=false --collect-sysdump-on-failure \
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Creating namespace cilium-test for connectivity check...
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-2] Creating namespace cilium-test for connectivity check...
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Deploying echo-same-node service...
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Deploying echo-other-node service...
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Deploying DNS test server configmap...
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-2] Deploying DNS test server configmap...
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Deploying same-node deployment...
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Deploying client deployment...
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Deploying client2 deployment...
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-2] Deploying echo-other-node service...
✨ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-2] Deploying other-node deployment...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for deployments [client client2 echo-same-node] to become ready...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-2] Waiting for deployments [echo-other-node] to become ready...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for CiliumEndpoint for pod cilium-test/client-755fb678bd-6c9wh to appear...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for CiliumEndpoint for pod cilium-test/client2-5b97d7bc66-25dd2 to appear...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for pod cilium-test/client2-5b97d7bc66-25dd2 to reach DNS server on cilium-test/echo-same-node-64774c64d5-jljqd pod...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for pod cilium-test/client-755fb678bd-6c9wh to reach DNS server on cilium-test/echo-same-node-64774c64d5-jljqd pod...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for pod cilium-test/client-755fb678bd-6c9wh to reach DNS server on cilium-test/echo-other-node-67b74b6685-gstkn pod...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for pod cilium-test/client2-5b97d7bc66-25dd2 to reach DNS server on cilium-test/echo-other-node-67b74b6685-gstkn pod...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for pod cilium-test/client2-5b97d7bc66-25dd2 to reach default/kubernetes service...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for pod cilium-test/client-755fb678bd-6c9wh to reach default/kubernetes service...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-64774c64d5-jljqd to appear...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-2] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-67b74b6685-gstkn to appear...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for Service cilium-test/echo-other-node to become ready...
⌛ [gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1] Waiting for Service cilium-test/echo-same-node to become ready...
ℹ️  Skipping IPCache check
ℹ️  Cilium version: 1.13.90
🏃 Running tests...
[=] Test [no-policies]
..........
[=] Test [allow-all-except-world]
..............
[=] Test [client-ingress]
..
[=] Test [all-ingress-deny]
....
[=] Test [all-egress-deny]
................
[=] Test [all-entities-deny]
....
[=] Test [cluster-entity]
..
[=] Test [cluster-entity-multi-cluster]
..
[=] Test [host-entity]
....
[=] Test [echo-ingress]
....
[=] Test [client-ingress-icmp]
..
[=] Test [client-egress]
🔍 Collecting sysdump with cilium-cli version: v0.12.12, args: [connectivity test --flow-validation=disabled --hubble=false --collect-sysdump-on-failure --context gke_***_us-west2-a_cilium-cilium-4012267258-mesh-1 --multi-cluster gke_***_us-west2-a_cilium-cilium-4012267258-mesh-2 --collect-sysdump-on-failure --test !/pod-to-.*-nodeport --test !no-policies/pod-to-service --test !/pod-to-world --test !/pod-to-cidr]
🔮 Detected Cilium installation in namespace "kube-system"
Detected Cilium operator in namespace "kube-system"
🔍 Collecting Kubernetes nodes
🔍 Collect Kubernetes nodes
🔍 Collecting Kubernetes events
🔍 Collect Kubernetes version
🔍 Collecting Kubernetes pods
🔍 Collecting Kubernetes namespaces
🔍 Collecting Kubernetes services
🔍 Collecting Kubernetes pods summary
🔍 Collecting Kubernetes endpoints
🔍 Collecting Kubernetes network policies
🔍 Collecting Cilium network policies
🔍 Collecting Kubernetes leases
🔍 Collecting Cilium egress NAT policies
🔍 Collecting Cilium cluster-wide network policies
🔍 Collecting Cilium local redirect policies
🔍 Collecting Cilium Egress Gateway policies
🔍 Collecting Cilium endpoints
🔍 Collecting Cilium identities
🔍 Collecting Cilium nodes
🔍 Collecting Ingresses
🔍 Collecting CiliumClusterwideEnvoyConfigs
🔍 Collecting CiliumEnvoyConfigs
🔍 Collecting Cilium etcd secret
🔍 Collecting the Cilium configuration
🔍 Collecting the Cilium daemonset(s)
🔍 Collecting the Hubble daemonset
🔍 Collecting the Hubble Relay configuration
🔍 Collecting the Hubble Relay deployment
🔍 Collecting the Hubble UI deployment
🔍 Collecting the 'clustermesh-apiserver' deployment
🔍 Collecting the Cilium operator deployment
🔍 Collecting the CNI configuration files from Cilium pods
🔍 Collecting the CNI configmap
🔍 Collecting gops stats from Cilium pods
🔍 Collecting gops stats from Hubble pods
🔍 Collecting gops stats from Hubble Relay pods
🔍 Collecting bugtool output from Cilium pods
🔍 Collecting logs from Cilium pods
🔍 Collecting logs from Cilium operator pods
🔍 Collecting logs from 'clustermesh-apiserver' pods
🔍 Collecting logs from Hubble pods
🔍 Collecting logs from Hubble Relay pods
🔍 Collecting logs from Hubble UI pods
⚠️ Deployment "hubble-ui" not found in namespace "kube-system" - this is expected if Hubble UI is not enabled
🔍 Collecting bugtool output from Tetragon pods
🔍 Collecting platform-specific data
🔍 Collecting kvstore data
🔍 Collecting Hubble flows from Cilium pods
level=info msg="Waited for 1.109862824s due to client-side throttling, not priority and fairness, request: GET:https://34.94.45.250/api/v1/namespaces/kube-system/pods?labelSelector=k8s-app%3Dcilium" subsys=klog
⚠️ The following tasks failed, the sysdump may be incomplete:
⚠️ [12] Collecting Cilium egress NAT policies: failed to collect Cilium egress NAT policies: the server could not find the requested resource (get ciliumegressnatpolicies.cilium.io)
⚠️ [13] Collecting Cilium Egress Gateway policies: failed to collect Cilium Egress Gateway policies: the server could not find the requested resource (get ciliumegressgatewaypolicies.cilium.io)
⚠️ [14] Collecting Cilium local redirect policies: failed to collect Cilium local redirect policies: the server could not find the requested resource (get ciliumlocalredirectpolicies.cilium.io)
⚠️ [19] Collecting CiliumClusterwideEnvoyConfigs: failed to collect CiliumClusterwideEnvoyConfigs: the server could not find the requested resource (get ciliumclusterwideenvoyconfigs.cilium.io)
⚠️ [20] Collecting CiliumEnvoyConfigs: failed to collect CiliumEnvoyConfigs: the server could not find the requested resource (get ciliumenvoyconfigs.cilium.io)
⚠️ Please note that depending on your Cilium version and installation options, this may be expected
🗳 Compiling sysdump
✅ The sysdump has been saved to /home/runner/work/cilium/cilium/cilium-sysdump-20230126-044242.zip

  ℹ️  Cilium agent kube-system/cilium-hn7z4 logs since 2023-01-26 04:42:40.210655819 +0000 UTC m=+300.285404655:

  ℹ️  Cilium agent kube-system/cilium-b48n6 logs since 2023-01-26 04:42:40.210655819 +0000 UTC m=+300.285404655:
2023-01-26T04:42:40.541014224Z level=info msg="Delete endpoint request" containerID=718baccde8 endpointID=384 k8sNamespace=l7-default-backend-6dc845c45d-59887 k8sPodName=l7-default-backend-6dc845c45d-59887 subsys=daemon
2023-01-26T04:42:40.542033983Z level=info msg="Releasing key" key="[k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system k8s:io.cilium.k8s.policy.cluster=cilium-cilium-4012267258-mesh-2 k8s:io.cilium.k8s.policy.serviceaccount=default k8s:io.kubernetes.pod.namespace=kube-system k8s:k8s-app=glbc k8s:name=glbc]" subsys=allocator
2023-01-26T04:42:40.546837369Z level=info msg="Removed endpoint" containerID=718baccde8 datapathPolicyRevision=21 desiredPolicyRevision=1 endpointID=384 identity=162011 ipv4=10.48.1.138 ipv6=10.48.1.138 k8sPodName=kube-system/l7-default-backend-6dc845c45d-59887 subsys=endpoint
2023-01-26T04:42:41.121113671Z level=info msg="Shutting down... " subsys=daemon
2023-01-26T04:42:41.121508376Z level=error msg="Interrupt received" subsys=hive
2023-01-26T04:42:41.121530021Z level=info msg="Stopped serving cilium API at unix:///var/run/cilium/cilium.sock" subsys=daemon
2023-01-26T04:42:41.121539397Z level=info msg="Shutting down... " subsys=health-server
2023-01-26T04:42:41.121547808Z level=info msg="Stopped serving cilium health API at unix:///var/run/cilium/health.sock" subsys=health-server
2023-01-26T04:42:41.121557556Z level=error msg="Failed to start the udp DNS proxy on [::]:34551" error="read udp [::]:34551: use of closed network connection" subsys=fqdn/dnsproxy
2023-01-26T04:42:41.121566451Z level=error msg="Failed to start the tcp DNS proxy on [::]:34551" error="accept tcp [::]:34551: use of closed network connection" subsys=fqdn/dnsproxy
2023-01-26T04:42:41.121574174Z level=info msg="Waiting for all endpoints' goroutines to be stopped." subsys=daemon
2023-01-26T04:42:41.121582091Z level=info msg="All endpoints' goroutines stopped." subsys=daemon
2023-01-26T04:42:41.121589788Z level=info msg="Stopping fswatcher" config=tls-server subsys=hubble
2023-01-26T04:42:41.136661153Z level=info msg="Stop hook executed" duration=17.035238ms function="cmd.newDaemonPromise.func2 (daemon_main.go:1700)" subsys=hive
2023-01-26T04:42:41.136797197Z level=info msg="Stop hook executed" duration="4.736µs" function="endpointmanager.newDefaultEndpointManager.func1 (cell.go:186)" subsys=hive
2023-01-26T04:42:41.136811509Z level=info msg="Stop hook executed" duration="65.955µs" function="*manager.manager.Stop" subsys=hive
2023-01-26T04:42:41.136926322Z level=info msg="Stop hook executed" duration="8.825µs" function="*resource.resource[*github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2alpha1.CiliumLoadBalancerIPPool].Stop" subsys=hive
2023-01-26T04:42:41.137238785Z level=info msg="Stop hook executed" duration="170.614µs" function="*resource.resource[*github.com/cilium/cilium/pkg/k8s/slim/k8s/api/core/v1.Namespace].Stop" subsys=hive
2023-01-26T04:42:41.137258962Z level=info msg="Stop hook executed" duration="6.722µs" function="*resource.resource[*github.com/cilium/cilium/pkg/k8s/apis/cilium.io/v2.CiliumNode].Stop" subsys=hive
2023-01-26T04:42:41.148091780Z level=info msg="Stop hook executed" duration="115.291µs" function="*resource.resource[*k8s.io/api/core/v1.Node].Stop" subsys=hive
2023-01-26T04:42:41.148129652Z level=info msg="Stop hook executed" duration="35.379µs" function="*gobgp.diffStore[*github.com/cilium/cilium/pkg/k8s/slim/k8s/api/core/v1.Service].Stop" subsys=hive
2023-01-26T04:42:41.148140042Z level=info msg="Stop hook executed" duration="46.676µs" function="*resource.resource[*github.com/cilium/cilium/pkg/k8s/slim/k8s/api/core/v1.Service].Stop" subsys=hive
2023-01-26T04:42:41.148149354Z level=info msg="Stop hook executed" duration="21.704µs" function="client.(*compositeClientset).onStop" subsys=hive
2023-01-26T04:42:41.148156478Z level=info msg="Stopped gops server" address="127.0.0.1:9890" subsys=gops
2023-01-26T04:42:41.148161890Z level=info msg="Stop hook executed" duration="15.119µs" function="gops.registerGopsHooks.func2 (cell.go:51)" subsys=hive

  ℹ️  Cilium agent kube-system/cilium-nvvnr logs since 2023-01-26 04:42:40.210655819 +0000 UTC m=+300.285404655:

  ℹ️  Cilium agent kube-system/cilium-8fcm6 logs since 2023-01-26 04:42:40.210655819 +0000 UTC m=+300.285404655:

  🟥 Running test client-egress: setting up test: applying network policies: unable to get policy revisions for Cilium pods: unable to upgrade connection: container not found ("cilium-agent")
[=] Test [client-egress-expression]

  ℹ️  Cilium agent kube-system/cilium-hn7z4 logs since 2023-01-26 04:43:01.382409761 +0000 UTC m=+321.457158597:

  🟥 Error reading Cilium logs: error getting cilium-agent logs for kube-system/cilium-b48n6: pods "cilium-b48n6" not found
[=] Test [client-egress-to-echo-service-account]

  ℹ️  Cilium agent kube-system/cilium-nvvnr logs since 2023-01-26 04:43:02.393724952 +0000 UTC m=+322.468473788:

  ℹ️  Cilium agent kube-system/cilium-8fcm6 logs since 2023-01-26 04:43:02.393724952 +0000 UTC m=+322.468473788:

  ℹ️  Cilium agent kube-system/cilium-hn7z4 logs since 2023-01-26 04:43:02.393724952 +0000 UTC m=+322.468473788:

  🟥 Error reading Cilium logs: error getting cilium-agent logs for kube-system/cilium-b48n6: pods "cilium-b48n6" not found

[=] Skipping Test [to-entities-world]

[=] Skipping Test [to-cidr-1111]
[=] Test [echo-ingress-from-other-client-deny]

  ℹ️  Cilium agent kube-system/cilium-8fcm6 logs since 2023-01-26 04:43:03.507513052 +0000 UTC m=+323.582261788:

  ℹ️  Cilium agent kube-system/cilium-hn7z4 logs since 2023-01-26 04:43:03.507513052 +0000 UTC m=+323.582261788:

  🟥 Error reading Cilium logs: error getting cilium-agent logs for kube-system/cilium-b48n6: pods "cilium-b48n6" not found
[=] Test [client-ingress-from-other-client-icmp-deny]

  ℹ️  Cilium agent kube-system/cilium-8fcm6 logs since 2023-01-26 04:43:04.620588846 +0000 UTC m=+324.695337582:

  ℹ️  Cilium agent kube-system/cilium-hn7z4 logs since 2023-01-26 04:43:04.620588846 +0000 UTC m=+324.695337582:

  🟥 Error reading Cilium logs: error getting cilium-agent logs for kube-system/cilium-b48n6: pods "cilium-b48n6" not found
[=] Test [client-egress-to-echo-deny]

  ℹ️  Cilium agent kube-system/cilium-hn7z4 logs since 2023-01-26 04:43:05.679167238 +0000 UTC m=+325.753916074:

  🟥 Error reading Cilium logs: error getting cilium-agent logs for kube-system/cilium-b48n6: pods "cilium-b48n6" not found
[=] Test [client-ingress-to-echo-named-port-deny]

  ℹ️  Cilium agent kube-system/cilium-8fcm6 logs since 2023-01-26 04:43:06.699984115 +0000 UTC m=+326.774732851:

  ℹ️  Cilium agent kube-system/cilium-hn7z4 logs since 2023-01-26 04:43:06.699984115 +0000 UTC m=+326.774732851:

  🟥 Error reading Cilium logs: error getting cilium-agent logs for kube-system/cilium-b48n6: pods "cilium-b48n6" not found
[=] Test [client-egress-to-echo-expression-deny]

  ℹ️  Cilium agent kube-system/cilium-8fcm6 logs since 2023-01-26 04:43:08.163922403 +0000 UTC m=+328.238671239:

  ℹ️  Cilium agent kube-system/cilium-hn7z4 logs since 2023-01-26 04:43:08.163922403 +0000 UTC m=+328.238671239:

  🟥 Error reading Cilium logs: error getting cilium-agent logs for kube-system/cilium-b48n6: pods "cilium-b48n6" not found
[=] Test [client-egress-to-echo-service-account-deny]

  🟥 Error reading Cilium logs: error getting cilium-agent logs for kube-system/cilium-b48n6: pods "cilium-b48n6" not found

[=] Skipping Test [client-egress-to-cidr-deny]

[=] Skipping Test [client-egress-to-cidr-deny-default]
[=] Test [health]

Sysdumps: cilium-sysdumps.zip
Link: https://github.com/cilium/cilium/actions/runs/4012267258/jobs/6890579269

Related: #19120 Also in this case it seems like a preemptible nodes issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/CIContinuous Integration testing issue or flakeci/flakeThis is a known failure that occurs in the tree. Please investigate me!staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions