Skip to content

Broken connectivity of pod to remote nodeport when Wireguard is used with L7 ingress policy + native routing + KPR #32899

@jschwinger233

Description

@jschwinger233

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Steps to reproduce

1. Create kind cilium cluster

make kind-down
export IMAGE=kindest/node:v1.29.4@sha256:3abb816a5b1061fb15c6e9e60856ec40d56b7b52bcea5f5f1350bc6e2320b6f8
./contrib/scripts/kind.sh --xdp --secondary-network "" 3 "" "" iptables dual 0.0.0.0 6443
 kubectl patch node kind-worker3 --type=json -p='[{"op":"add","path":"/metadata/labels/cilium.io~1no-schedule","value":"true"}]'

git checkout 8cd748de7f4011d4ab9fc04338fa69c194205e6a # https://github.com/cilium/cilium/commit/8cd748de7f4011d4ab9fc04338fa69c194205e6a
make kind-image
kind load --name kind docker-image localhost:5000/cilium/cilium-dev:local
 kind load --name kind docker-image localhost:5000/cilium/operator-generic:local
 ./cilium-cli install --wait     --chart-directory=./install/kubernetes/cilium     --helm-set=debug.enabled=true     --helm-set=debug.verbose=envoy     --helm-set=hubble.eventBufferCapacity=65535     --helm-set=bpf.monitorAggregation=none     --helm-set=cluster.name=default     --helm-set=authentication.mutual.spire.enabled=false     --nodes-without-cilium     --helm-set-string=kubeProxyReplacement=true     --set='' --helm-set-string=routingMode=native --helm-set-string=autoDirectNodeRoutes=true --helm-set-string=ipv4NativeRoutingCIDR=10.244.0.0/16 --helm-set-string=ipv6NativeRoutingCIDR=fd00:10:244::/56 --helm-set=devices='{eth0,eth1}' --helm-set-string=loadBalancer.mode=snat  --helm-set=ipv6.enabled=true --helm-set=bpf.masquerade=true --helm-set=egressGateway.enabled=true --helm-set=encryption.enabled=true --helm-set=encryption.type=wireguard --helm-set=encryption.nodeEncryption=true --helm-set=encryption.ipsec.encryptedOverlay=false   --helm-set=ingressController.enabled=true --helm-set=ingressController.service.type=NodePort --helm-set=image.repository=localhost:5000/cilium/cilium-dev   --helm-set=image.useDigest=false   --helm-set=image.tag=local   --helm-set=operator.image.repository=localhost:5000/cilium/operator      --helm-set=operator.image.tag=local --helm-set operator.image.suffix=   --helm-set=operator.image.useDigest=false  --helm-set=image.pullPolicy=IfNotPresent --helm-set=operator.image.pullPolicy=IfNotPresent  --helm-set=debug.verbose=datapath
./cilium-cli status --wait
./cilium-cli connectivity test --include-unsafe-tests  --flush-ct --test "skipall"  -v  -p

2. Check pod to remote nodeport connectivity

# kubectl -ncilium-test get po -owide | grep client2
client2-ccd7b8bdf-dtdng              1/1     Running   0          32m   10.244.3.159   kind-worker2         <none>           <none>

# kubectl -nkube-system get po -owide | grep cilium | grep kind-worker
cilium-2lwb2                                 1/1     Running   0          34m   172.19.0.5     kind-worker2         <none>           <none>
cilium-9bn77                                 1/1     Running   0          34m   172.19.0.4     kind-worker          <none>           <none>
cilium-envoy-6s69m                           1/1     Running   0          34m   172.19.0.5     kind-worker2         <none>           <none>
cilium-envoy-wjhfh                           1/1     Running   0          34m   172.19.0.4     kind-worker          <none>           <none>
cilium-operator-7b87f5b697-w88vj             1/1     Running   0          34m   172.19.0.4     kind-worker   

# client2 pod is on kind-worker2, so we choose a remote nodeport on kind-worker
# kubectl -nkube-system exec cilium-9bn77 -- cilium service list  | grep -i hostport
17   172.19.0.4:4000           HostPort       1 => 10.244.2.175:8080 (active)            
18   [fc00:c111::4]:4000       HostPort       1 => [fd00:10:244:2::fc99]:8080 (active)   
19   0.0.0.0:4000              HostPort       1 => 10.244.2.175:8080 (active)   
20   [::]:4000                 HostPort       1 => [fd00:10:244:2::fc99]:8080 (active) 

# kubectl -ncilium-test exec client2-ccd7b8bdf-dtdng  -- curl 172.19.0.4:4000 -I
HTTP/1.1 200 OK
X-Powered-By: Express
Vary: Origin, Accept-Encoding
Access-Control-Allow-Credentials: true
Accept-Ranges: bytes
Cache-Control: public, max-age=0
Last-Modified: Tue, 09 Jan 2024 12:57:12 GMT
ETag: W/"809-18cee4c6040"
Content-Type: text/html; charset=UTF-8
Content-Length: 2057
Date: Wed, 05 Jun 2024 03:38:14 GMT
Connection: keep-alive
Keep-Alive: timeout=5

3. Apply an L7 ingress policy and check connectivity again

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-8080-ingress
spec:
  endpointSelector:
    matchLabels:
      kind: echo
  ingress:
  - fromEntities:
    - all
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "GET"
          path: "/$"
kubectl -ncilium-test apply -f allow-8080-ingress.yaml

Then pod to remote nodeport can't be connected:

$ kubectl -ncilium-test exec client2-ccd7b8bdf-dtdng  -- curl 172.19.0.4:4000 -I -v
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 172.19.0.4:4000...
  0     0    0     0    0     0      0      0 --:--:--  0:00:09 --:--:--     0^C

Cilium Version

Client: 1.16.0-dev 8cd748de7f 2024-05-20T11:00:02+09:00 go version go1.22.3 linux/amd64
Daemon: 1.16.0-dev 8cd748de7f 2024-05-20T11:00:02+09:00 go version go1.22.3 linux/amd64

Kernel Version

Linux liangzc-l-PF4RDLEQ 6.5.0-1023-oem #24-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 14:26:31 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: v1.30.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4

Regression

No response

Sysdump

No response

Relevant log output

No response

Anything else?

This bug is similar to #32897, caused by missing revDNAT for proxy's reply.

TLDR, the proxy's reply (tcp syn-ack, on behalf of dst pod) will be wireguard-encrypted ahead of revDNAT at to-netdev@eth0, then revDNAT code can't process the encrypted packet. The reply will finally be routed back to source pod, with the wrong src IP, ending up with SKB_DROP_REASON_NO_SOCKET.

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

affects/v1.14This issue affects v1.14 branchaffects/v1.15This issue affects v1.15 brancharea/encryptionImpacts encryption support such as IPSec, WireGuard, or kTLS.area/kprAnything related to our kube-proxy replacement.area/loadbalancingImpacts load-balancing and Kubernetes service implementationsarea/proxyImpacts proxy components, including DNS, Kafka, Envoy and/or XDS servers.feature/wireguardRelates to Cilium's Wireguard featurekind/bugThis is a bug in the Cilium logic.sig/policyImpacts whether traffic is allowed or denied based on user-defined policies.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions