Skip to content

Enabling ambient for namespace with EKS and Cilium with bpf.masquerade=true causes Readiness/Liveness timeouts #52208

@wpbeckwith

Description

@wpbeckwith

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

We have a 1.29 EKS cluster with the following

  • Karpenter 0.37.0 for cluster autoscaling
  • Cilium v1.16.0-rc.2 with KubeProxyReplacement = true (previously used 1.15.7 with same error)
  • Istio Ambient mobe 1.22.2
    image

When the following deployment is deployed to the cluster and ambient mode enabled for the namespace then readiness and liveness probes will begin to fail. Disabling ambient mode for the namespace allows then to function again.

---
apiVersion: v1
kind: Namespace
metadata:
  name: example
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: web
  name: web
  namespace: example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - image: nginx
        name: nginx
        ports:
        - containerPort: 80
          name: http
        readinessProbe:
          httpGet:
            path: /
            port: http
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1        

Before Ambient
image

After Ambient
image

This look like it would be related from the platform specific section of the ambient install guide, #49277. However this is a fresh cluster with no NetworkPolicies or CiliumNetworkPolicies defined.

Cilium was installed via helm with the following values

bpf:
  hostLegacyRouting: false
  masquerade: true

gatewayAPI:
  enabled: true

hubble:
  enabled: true
  relay:
    enabled: true
  ui:
    enabled: true

ingressController:
  enabled: false
  loadBalancerMode: shared

ipam:
  mode: eni
  
ipv4NativeRoutingCIDR: "10.162.16.0/20"

kubeProxyReplacement: true

loadBalancer:
  l7:
    backend: envoy

nodeinit:
  enabled: true

operator:
  replicas: 2
  unmanagedPodWatcher:
    restart: true

routingMode: native

cni:
  exclusive: false

socketLB:
  hostNamespaceOnly: true

tunnel: disabled

The istio install followed the ambient guide to install the base, cni and istiod.

Version

istioctl version
client version: 1.22.2
control plane version: 1.22.2
data plane version: 1.22.2 (5 proxies)

k version
Client Version: v1.29.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.6-eks-db838b0

helm version --short
v3.13.3+gc8b9489

Additional Information

istioctl bug-report

Target cluster context: mercury

Running with the following config:

istio-namespace: istio-system
full-secrets: false
timeout (mins): 30
include: { }
exclude: { Namespaces: kube-node-lease,kube-public,kube-system,local-path-storage }
end-time: 2024-07-20 18:41:52.355343 -0500 CDT

Cluster endpoint: https://D4B...2ED1D2CD.gr7.us-west-2.eks.amazonaws.com
CLI version:
version.BuildInfo{Version:"1.22.2", GitRevision:"Homebrew", GolangVersion:"go1.22.4", BuildStatus:"Homebrew", GitTag:"1.22.2"}

The following Istio control plane revisions/versions were found in the cluster:
Revision default:
&version.MeshInfo{
{
Component: "pilot",
Revision: "default",
Info: version.BuildInfo{Version:"1.22.2", GitRevision:"204da5ba47f295a8dc56936333e692f6a8707649", GolangVersion:"", BuildStatus:"Clean", GitTag:"1.22.2"},
},
}

The following proxy revisions/versions were found in the cluster:
Revision default: Versions {1.22.2}

Fetching logs for the following containers:

example/web/web-6bdf57c77c-rj2p9/nginx
istio-gateways/istio-gateway/istio-gateway-5f7f7c65d8-zpnrn/istio-proxy
istio-system/istio-cni-node/istio-cni-node-2zqf6/install-cni
istio-system/istio-cni-node/istio-cni-node-njg65/install-cni
istio-system/istio-cni-node/istio-cni-node-qk422/install-cni
istio-system/istio-cni-node/istio-cni-node-tdpvd/install-cni
istio-system/istiod/istiod-54f4c45c55-qnfsm/discovery
istio-system/ztunnel/ztunnel-9crqs/istio-proxy
istio-system/ztunnel/ztunnel-dsmnk/istio-proxy
istio-system/ztunnel/ztunnel-gb6vb/istio-proxy
istio-system/ztunnel/ztunnel-j78cb/istio-proxy
karpenter/karpenter/karpenter-55f6548666-bj5kg/controller
karpenter/karpenter/karpenter-55f6548666-bp5gc/controller
karpenter/karpenter/karpenter-55f6548666-ddcqm/controller

Fetching Istio control plane information from cluster.

Fetching CNI logs from cluster.

Running Istio analyze on all namespaces and report as below:
Analysis Report:
Info [IST0102] (Namespace cilium-secrets) The namespace is not enabled for Istio injection. Run 'kubectl label namespace cilium-secrets istio-injection=enabled' to enable it, or 'kubectl label namespace cilium-secrets istio-injection=disabled' to explicitly mark it as not needing injection.
Info [IST0102] (Namespace default) The namespace is not enabled for Istio injection. Run 'kubectl label namespace default istio-injection=enabled' to enable it, or 'kubectl label namespace default istio-injection=disabled' to explicitly mark it as not needing injection.
Info [IST0102] (Namespace karpenter) The namespace is not enabled for Istio injection. Run 'kubectl label namespace karpenter istio-injection=enabled' to enable it, or 'kubectl label namespace karpenter istio-injection=disabled' to explicitly mark it as not needing injection.
Info [IST0102] (Namespace kube-node-lease) The namespace is not enabled for Istio injection. Run 'kubectl label namespace kube-node-lease istio-injection=enabled' to enable it, or 'kubectl label namespace kube-node-lease istio-injection=disabled' to explicitly mark it as not needing injection.
Info [IST0102] (Namespace kube-public) The namespace is not enabled for Istio injection. Run 'kubectl label namespace kube-public istio-injection=enabled' to enable it, or 'kubectl label namespace kube-public istio-injection=disabled' to explicitly mark it as not needing injection.
Info [IST0102] (Namespace kube-system) The namespace is not enabled for Istio injection. Run 'kubectl label namespace kube-system istio-injection=enabled' to enable it, or 'kubectl label namespace kube-system istio-injection=disabled' to explicitly mark it as not needing injection.
Info [IST0118] (Service kube-system/aws-load-balancer-webhook-service) Port name webhook-server (port: 443, targetPort: webhook-server) doesn't follow the naming convention of Istio port.
Info [IST0118] (Service kube-system/cilium-envoy) Port name envoy-metrics (port: 9964, targetPort: envoy-metrics) doesn't follow the naming convention of Istio port.
Info [IST0118] (Service kube-system/hubble-peer) Port name peer-service (port: 443, targetPort: 4244) doesn't follow the naming convention of Istio port.
Info [IST0118] (Service kube-system/hubble-relay) Port name (port: 80, targetPort: grpc) doesn't follow the naming convention of Istio port.
Info [IST0118] (Service kube-system/kube-dns) Port name metrics (port: 9153, targetPort: 9153) doesn't follow the naming convention of Istio port.
Creating an archive at /Users/wendell.beckwith/code/kubernetes/bug-report.tar.gz.
Time used for creating the tar file is 96.614542ms.
Cleaning up temporary files in /var/folders/kw/8_61_vh14cb5dv7bgq93x1lh0000gp/T/bug-report.
Done.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/ambientIssues related to ambient mesharea/networkinglifecycle/staleproofIndicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions