Skip to content

CI: ConformanceAKS: Unable to enable Hubble: timeout while waiting for status to become successful #19766

@jibi

Description

@jibi

Hit consistently on master branch.

https://github.com/cilium/cilium/runs/6375385888?check_suite_focus=true
cilium-sysdump-out.zip.zip

The Enable Relay target is failing with:

✨ Deploying Relay...
⌛ Waiting for Hubble to be installed...

    /¯¯\
Error: Unable to enable Hubble:  timeout while waiting for status to become successful: context deadline exceeded
 /¯¯\__/¯¯\    Cilium:         OK
 \__/¯¯\__/    Operator:       OK
 /¯¯\__/¯¯\    Hubble:         1 errors, 1 warnings
 \__/¯¯\__/    ClusterMesh:    disabled
    \__/

Deployment        cilium-operator    Desired: 1, Ready: 1/1, Available: 1/1
Deployment        hubble-relay       Desired: 1, Unavailable: 1/1
DaemonSet         cilium             Desired: 3, Ready: 3/3, Available: 3/3
Containers:       cilium             Running: 3
                  cilium-operator    Running: 1
                  hubble-relay       Pending: 1
Cluster Pods:     5/6 managed by Cilium
Image versions    hubble-relay       quay.io/cilium/hubble-relay-ci:0bf18746283fb7337a497f7f61a30dabac4190be: 1
                  cilium             quay.io/cilium/cilium-ci:0bf18746283fb7337a497f7f61a30dabac4190be: 3
                  cilium-operator    quay.io/cilium/operator-azure-ci:0bf18746283fb7337a497f7f61a30dabac4190be: 1
Errors:           hubble-relay       hubble-relay                     1 pods of Deployment hubble-relay are not ready
Warnings:         hubble-relay       hubble-relay-8658f5b4f5-c4h2z    pod is pending

Error: Process completed with exit code 1.

From a quick look at the sysdump it seems like hubble-relay could not be scheduled on any node:

➜  cilium-sysdump-out cat k8s-pods-20220510-182331.yaml | grep Pending -B 6
      message: '0/3 nodes are available: 1 node(s) had taint {CriticalAddonsOnly:
        true}, that the pod didn''t tolerate, 2 node(s) had taint {node.cilium.io/agent-not-ready:
        true}, that the pod didn''t tolerate.'
      reason: Unschedulable
      status: "False"
      type: PodScheduled
    phase: Pending

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/CIContinuous Integration testing issue or flakeci/flakeThis is a known failure that occurs in the tree. Please investigate me!staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions