Skip to content

Return traffic between egress and ingress L7 proxy is lost when they are on the same node on GKE. #29864

@jrajahalme

Description

@jrajahalme

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

The issue was first reproduced by @cpu601 using Cilium Ingress on GKE with an ingress L7 policy on the backend pods.

In testing, this issue was seemingly fixed by adding route table 2005 and the associated routing rule that matches on ingress proxy return traffic, so this may be related to route table 2005 removal done earlier this year. However, it is clear if the route table 2005 ever existed in endpoint routes mode, which is used when Cilium is configured to use the GKE mode.

To reproduce, create a single node GKE cluster:

% gcloud container clusters create jarno-test --zone europe-north1-a --labels usage=dev-jarno,owner=jarno,expiry=2023-12-31 --image-type COS_CONTAINERD --num-nodes 1 --machine-type e2-custom-2-4096 --disk-type pd-standard --disk-size 10GB --preemptible

Then install Cilium 1.13.9 using this cilium-values.yaml:

hubble:
  enabled: true
  relay:
    enabled: true
nodeinit:
  enabled: true
  reconfigureKubelet: true
  removeCbrBridge: true
cni:
  binPath: /home/kubernetes/bin
gke:
  enabled: true
ipam:
  mode: kubernetes
ipv4NativeRoutingCIDR: 10.24.0.0/14
envoy:
  enabled: false
gatewayAPI:
  enabled: false
ingressController:
  enabled: true
  enforceHttps: false
  loadbalancerMode: dedicated
kubeProxyReplacement: strict
k8sServiceHost: 10.166.0.52
k8sServicePort: 443
debug:
  enabled: true
  verbose: envoy

Note that you need to do these updates:

  • ipv4NativeRoutingCIDR: use the prefix you see on pods in kubectl get pods -A -o wide
  • k8sServiceHost & k8sServicePort: Use the values from kubectl get endpoints kubernetes

Install with helm:

helm upgrade -i cilium cilium/cilium --version 1.13.9 --namespace=kube-system -f cilium-values.yaml

Then install the app and Ingress from these files:
api-v1.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-v1
  labels:
    app: api
    version: "1"
spec:
  replicas: 2
  selector:
    matchLabels:
      app: api
      version: "1"
  template:
    metadata:
      labels:
        app: api
        version: "1"
      annotations:
    spec:
      containers:
        - name: api
          image: nginx
          ports:
          - containerPort: 80
            name: http
          volumeMounts:
          - mountPath: /usr/share/nginx/html/
            name: api
      initContainers:
        - name: api-init
          image: everpeace/curl-jq
          volumeMounts:
          - mountPath: /usr/share/nginx/html/
            name: api
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: HOST_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
          command:
            - "sh"
            - "-c"
            - | 
              cat <<EOF >/usr/share/nginx/html/index.html
              
              Service-Name     = api
              Service-Version  = v1
              Service-ID       = ${POD_NAME}
              Pod IP address   = ${POD_IP}
              Node             = ${HOST_NAME}
              
              EOF
      volumes:
        - name: api
          emptyDir: {}

---
apiVersion: v1
kind: Service
metadata:
  name: api-v1
  namespace: default
  labels:
    app: api
    version: "1"
spec:
  selector:
    app: api
    version: "1"
  ports:
    - name: api
      port: 80
      targetPort: 80
 # type: LoadBalancer

ingress-test.yaml:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
 name: basic-ingress
 namespace: default
spec:
 ingressClassName: cilium
 rules:
 - http:
     paths:
     - backend:
         service:
           name: api-v1
           port:
             number: 80
       path: /
       pathType: Prefix
kubectl apply -f api-v1.yaml 
kubectl apply -f ingress-test.yaml 

Wait until the Ingress IP gets assigned, then test:

INGRESS_IP=$(kubectl get ingress basic-ingress -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl -vvv $INGRESS_IP

This should work (200 response)

The install policies from these files:
api-policy.yaml:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "api"
spec:
  endpointSelector:
    matchLabels:
      app: api
  ingress:
    - fromEntities:
        - ingress
        - world
        - remote-node
        - host
      toPorts:
        - ports:
          - port: "80"
            protocol: TCP
          rules:
            http:
            - method: "GET"
              path: "/"

deny-all.yaml:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "deny-all"
spec:
  endpointSelector:
    matchLabels:
      io.kubernetes.pod.namespace: default
  ingress:
    - {}
  egress:
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP
          rules:
            dns:
              - matchPattern: "*"
    - toEndpoints:
        - matchLabels:
            app: api
      toPorts:
        - ports:
            - port: "80"
              protocol: TCP
kubectl apply -f api-policy.yaml
kubectl apply -f deny-all.yaml

Test again:

curl -vvv $INGRESS_IP

This time curl should have delay, and then get a 503 response.

When investigating this with tcpdump it was apparent that the TCP SYN ACK packets sent from the ingress proxy back to the Cilium Ingress proxy were never delivered to the local stack.

Then proceed to add the routing table 2005 and the associated routing rule that matches the (return) traffic from the ingress proxy. Replace the IP with the one you see on the cilium_host interface on the node:

ip route replace table 2005 10.24.0.204/32 dev cilium_host
ip route replace table 2005 default via 10.24.0.204
ip -4 rule add fwmark 0xA00/0xF00 pref 10 lookup 2005

After this the test should work again.

Cilium Version

1.13.9, also reported to happen on Cilium 1.14

Kernel Version

5.15.109+

Kubernetes Version

v1.27.3-gke.100

Sysdump

No response

Relevant log output

2023-12-13T12:17:41.043609344Z level=debug msg="[[C2] connect timeout" subsys=envoy-pool threadID=485
2023-12-13T12:17:41.043685617Z level=debug msg="[[C2] closing data_to_write=0 type=1" subsys=envoy-connection threadID=485
2023-12-13T12:17:41.043693542Z level=debug msg="[[C2] closing socket: 1" subsys=envoy-connection threadID=485
2023-12-13T12:17:41.043698980Z level=debug msg="[[C2] raising connection event 1" subsys=envoy-connection threadID=485
2023-12-13T12:17:41.043705156Z level=debug msg="[[C2] disconnect. resetting 0 pending requests" subsys=envoy-client threadID=485
2023-12-13T12:17:41.043710670Z level=debug msg="[[C2] client disconnected, failure reason: " subsys=envoy-pool threadID=485
2023-12-13T12:17:41.043716636Z level=debug msg="[[C1][S14041702317163141994] upstream reset: reset reason: connection failure, transport failure reason: " subsys=envoy-router threadID=485
2023-12-13T12:17:41.044593034Z level=debug msg="[item added to deferred deletion list (size=1)" subsys=envoy-main threadID=485
2023-12-13T12:17:41.044727894Z level=debug msg="[item added to deferred deletion list (size=2)" subsys=envoy-main threadID=485
2023-12-13T12:17:41.044820680Z level=debug msg="[[C1][S14041702317163141994] Sending local reply with details upstream_reset_before_response_started{connection_failure}" subsys=envoy-http threadID=485
2023-12-13T12:17:41.044970128Z level=debug msg="[enableTimer called on 0x2793bf5177a0 for 300000ms, min is 300000ms" subsys=envoy-misc threadID=485
2023-12-13T12:17:41.045066524Z level=debug msg="[[C1][S14041702317163141994] encode headers called: filter=cilium.l7policy status=0" subsys=envoy-http threadID=485
2023-12-13T12:17:41.045186531Z level=debug msg="[[C1][S14041702317163141994] encoding headers via codec (end_stream=false):" subsys=envoy-http threadID=485
2023-12-13T12:17:41.045267072Z level=debug msg="':status', '503'" subsys=envoy-http threadID=485
2023-12-13T12:17:41.045380270Z level=debug msg="'content-length', '91'" subsys=envoy-http threadID=485
2023-12-13T12:17:41.045461934Z level=debug msg="'content-type', 'text/plain'" subsys=envoy-http threadID=485
2023-12-13T12:17:41.045580365Z level=debug msg="'date', 'Wed, 13 Dec 2023 12:17:40 GMT'" subsys=envoy-http threadID=485
2023-12-13T12:17:41.045665479Z level=debug msg="'server', 'envoy'" subsys=envoy-http threadID=485

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Labels

kind/bugThis is a bug in the Cilium logic.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions