Skip to content

Security identity is overridden and packets are dropped when using Cilium with ENI mode + AWS VPC CNI + ClusterID 128-255 #21330

@YutaroHayakawa

Description

@YutaroHayakawa

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

While investigating #20797, we found that the 7th bit of the skb->mark used by Cilium to carry identity is used by AWS VPC CNI for PBR as well. As a result, the identity of the Cilium is overridden by iptables rules in some cases and causes packet drop by the network policy.

There are two possible iptables rules that overwrite the identity

  1. The rule in the CILIUM_PRE_mangle chain of mangle table installed by ENI mode Cilium (introduced in c7f9997)
-A CILIUM_PRE_mangle -i lxc+ -m comment --comment "cilium: primary ENI" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80
  1. The rule in the PREROUTING chain of nat table installed by AWS VPC CNI
-A PREROUTING -m comment --comment "AWS, CONNMARK" -j CONNMARK --restore-mark --nfmask 0x80 --ctmask 0x80

And there are three possible setup scenarios affected by this issue

  1. Cilium is running with ENI mode after uninstalling AWS VPC CNI (e.g. create EKS cluster and install Cilium after that)
  2. Cilium is running with AWS VPC CNI with chaining mode
  3. Cilium is running with ENI mode without AWS VPC CNI from the beginning (e.g. self-hosted k8s cluster on EC2 hosts)

Reported by: @carloscastrojumo @EricMountain @hemanthmalla

Cilium Version

All Cilium versions after 1.9-rc1 affected

Kernel Version

Kernel version doesn't matter

Kubernetes Version

Kubernetes version doesn't matter

Sysdump

No response

Relevant log output

Investigation logs

Connectivity test logs with ClusterID = 128

📋 Test Report
❌ 12/23 tests failed (26/114 actions), 0 tests skipped, 1 scenarios skipped:
Test [allow-all-except-world]:
  ❌ allow-all-except-world/pod-to-pod/curl-0: cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31) -> cilium-test/echo-same-node-7894f8ffcd-vws46 (192.168.110.77:8080)
  ❌ allow-all-except-world/pod-to-pod/curl-1: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/echo-same-node-7894f8ffcd-vws46 (192.168.110.77:8080)
  ❌ allow-all-except-world/client-to-client/ping-0: cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31) -> cilium-test/client2-74f4559c78-xzg5n (192.168.101.165:0)
  ❌ allow-all-except-world/client-to-client/ping-1: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31:0)
  ❌ allow-all-except-world/pod-to-service/curl-0: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/echo-same-node (echo-same-node:8080)
  ❌ allow-all-except-world/pod-to-service/curl-1: cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31) -> cilium-test/echo-same-node (echo-same-node:8080)
Test [client-ingress]:
  ❌ client-ingress/client-to-client/ping-1: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31:0)
Test [echo-ingress]:
  ❌ echo-ingress/pod-to-pod/curl-0: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/echo-same-node-7894f8ffcd-vws46 (192.168.110.77:8080)
Test [client-ingress-icmp]:
  ❌ client-ingress-icmp/client-to-client/ping-1: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31:0)
Test [echo-ingress-l7]:
  ❌ echo-ingress-l7/pod-to-pod-with-endpoints/curl-0-public: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> curl-0-public (192.168.110.77:8080)
  ❌ echo-ingress-l7/pod-to-pod-with-endpoints/curl-0-private: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> curl-0-private (192.168.110.77:8080)
  ❌ echo-ingress-l7/pod-to-pod-with-endpoints/curl-0-privatewith-header: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> curl-0-privatewith-header (192.168.110.77:8080)
Test [echo-ingress-l7-named-port]:
  ❌ echo-ingress-l7-named-port/pod-to-pod-with-endpoints/curl-1-public: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> curl-1-public (192.168.110.77:8080)
  ❌ echo-ingress-l7-named-port/pod-to-pod-with-endpoints/curl-1-private: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> curl-1-private (192.168.110.77:8080)
  ❌ echo-ingress-l7-named-port/pod-to-pod-with-endpoints/curl-1-privatewith-header: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> curl-1-privatewith-header (192.168.110.77:8080)
Test [echo-ingress-from-other-client-deny]:
  ❌ echo-ingress-from-other-client-deny/pod-to-pod/curl-0: cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31) -> cilium-test/echo-same-node-7894f8ffcd-vws46 (192.168.110.77:8080)
  ❌ echo-ingress-from-other-client-deny/client-to-client/ping-0: cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31) -> cilium-test/client2-74f4559c78-xzg5n (192.168.101.165:0)
  ❌ echo-ingress-from-other-client-deny/client-to-client/ping-1: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31:0)
Test [client-ingress-from-other-client-icmp-deny]:
  ❌ client-ingress-from-other-client-icmp-deny/pod-to-pod/curl-0: cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31) -> cilium-test/echo-same-node-7894f8ffcd-vws46 (192.168.110.77:8080)
  ❌ client-ingress-from-other-client-icmp-deny/pod-to-pod/curl-1: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/echo-same-node-7894f8ffcd-vws46 (192.168.110.77:8080)
  ❌ client-ingress-from-other-client-icmp-deny/client-to-client/ping-0: cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31) -> cilium-test/client2-74f4559c78-xzg5n (192.168.101.165:0)
Test [client-egress-to-echo-deny]:
  ❌ client-egress-to-echo-deny/client-to-client/ping-0: cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31) -> cilium-test/client2-74f4559c78-xzg5n (192.168.101.165:0)
  ❌ client-egress-to-echo-deny/client-to-client/ping-1: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/client-7bdbddd7b-7rrzq (192.168.22.31:0)
Test [client-ingress-to-echo-named-port-deny]:
  ❌ client-ingress-to-echo-named-port-deny/pod-to-pod/curl-0: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/echo-same-node-7894f8ffcd-vws46 (192.168.110.77:8080)
Test [client-egress-to-echo-expression-deny]:
  ❌ client-egress-to-echo-expression-deny/pod-to-pod/curl-0: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/echo-same-node-7894f8ffcd-vws46 (192.168.110.77:8080)
Test [client-egress-to-echo-service-account-deny]:
  ❌ client-egress-to-echo-service-account-deny/pod-to-pod/curl-0: cilium-test/client2-74f4559c78-xzg5n (192.168.101.165) -> cilium-test/echo-same-node-7894f8ffcd-vws46 (192.168.110.77:8080)
connectivity test failed: 12 tests failed

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects/v1.14This issue affects v1.14 branchaffects/v1.15This issue affects v1.15 branchaffects/v1.16This issue affects v1.16 brancharea/clustermeshRelates to multi-cluster routing functionality in Cilium.area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.area/eniImpacts ENI based IPAM.kind/bugThis is a bug in the Cilium logic.pinnedThese issues are not marked stale by our issue bot.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions