-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
I tracked down this issue to a change that happened between v1.14.0-snapshot.1 and v1.14.0-snapshot.2
On an EKS cluster, packets that are supposed to go to the EKS kube-apiserver get silently dropped when this Helm config is applied:
bpf:
masquerade: true
devices: eth0
eni:
enabled: true
ipam:
mode: eni
kubeProxyReplacement: strict
tunnel: disabled
Disabling BPF masquerading, using eth+
on the devices or setting vxlan
for the tunnel, all lead to a working communication with the kube-apiserver. KRP is unrelated, but it needs to be set to some value when using the 1.14.0 chart due to the default value changing to false.
Same Helm configuration, but as a command (for easier testing):
helm upgrade -n kube-system cilium cilium/cilium --version 1.14.0 \
--set image.tag=v1.14.0-snapshot.2 --set image.useDigest=false \
--set bpf.masquerade=true \
--set devices=eth0 \
--set tunnel=disabled \
--set eni.enabled=true \
--set ipam.mode=eni \
--set kubeProxyReplacement=strict
When setting devices: eth+
, I can see with cilium monitor
that the packet is now being routed through eth1
.
With 1.13.5, the packet goes and comes back over ifindex 0
.
Policy verdict log: flow 0x93b86448 local EP ID 3678, remote ID 16777218, proto 6, egress, action allow, match L3-L4, 10.4.201.117:40748 -> 10.4.1.206:443 tcp SYN
-> stack flow 0x93b86448 , identity 35566->16777218 state new ifindex 0 orig-ip 0.0.0.0: 10.4.201.117:40748 -> 10.4.1.206:443 tcp SYN
-> endpoint 3678 flow 0xe857e857 , identity 16777218->35566 state reply ifindex 0 orig-ip 10.4.1.206: 10.4.1.206:443 -> 10.4.201.117:40748 tcp SYN, ACK
-> stack flow 0x93b86448 , identity 35566->16777218 state established ifindex 0 orig-ip 0.0.0.0: 10.4.201.117:40748 -> 10.4.1.206:443 tcp ACK
With 1.14.0-snapshot.2 & eth0, it sends the packet on eth0 but the syn,ack never gets back:
Policy verdict log: flow 0xcf0da62e local EP ID 3678, remote ID 16777268, proto 6, egress, action allow, match L4-Only, 10.4.201.117:51344 -> 10.4.1.206:443 tcp SYN
-> network flow 0xcf0da62e , identity 35566->16777268 state new ifindex eth0 orig-ip 0.0.0.0: 10.4.201.117:51344 -> 10.4.1.206:443 tcp SYN
With 1.14.0-snapshot.2 & eth+, the packet gets sent on eth1 and we get the syn,ack back:
Policy verdict log: flow 0x2d05b7a4 local EP ID 3678, remote ID 16777217, proto 6, egress, action allow, match L4-Only, 10.4.201.117:35980 -> 10.4.1.206:443 tcp SYN
-> network flow 0x2d05b7a4 , identity 35566->16777217 state new ifindex eth1 orig-ip 0.0.0.0: 10.4.201.117:35980 -> 10.4.1.206:443 tcp SYN
-> endpoint 3678 flow 0xa6c8a6c8 , identity 16777217->35566 state reply ifindex lxc6bedd6da59c0 orig-ip 10.4.1.206: 10.4.1.206:443 -> 10.4.201.117:35980 tcp SYN, ACK
-> network flow 0x2d05b7a4 , identity 35566->16777217 state established ifindex eth1 orig-ip 0.0.0.0: 10.4.201.117:35980 -> 10.4.1.206:443 tcp ACK
When discussing this with Julian, he made me realize that #22006 (merged during the mentioned window) made changes around bpf host routing. So I tried switching to legacy host routing instead of BPF Host Routing, and connectivity was restored.
With 1.14.0-snapshot.2, devices=eth0
and bpf.hostLegacyRouting=true
. Again, syn and syn,ack on ifindex 0
:
Policy verdict log: flow 0x2050dec6 local EP ID 3678, remote ID 16777268, proto 6, egress, action allow, match L4-Only, 10.4.201.117:34694 -> 10.4.1.206:443 tcp SYN
-> stack flow 0x2050dec6 , identity 35566->16777268 state new ifindex 0 orig-ip 0.0.0.0: 10.4.201.117:34694 -> 10.4.1.206:443 tcp SYN
-> endpoint 3678 flow 0xb495b495 , identity 16777268->35566 state reply ifindex 0 orig-ip 10.4.1.206: 10.4.1.206:443 -> 10.4.201.117:34694 tcp SYN, ACK
-> stack flow 0x2050dec6 , identity 35566->16777268 state established ifindex 0 orig-ip 0.0.0.0: 10.4.201.117:34694 -> 10.4.1.206:443 tcp ACK
I'm not sure if it means anything, but it's interesting that the Cilium identities for the kube-apiserver change depending on the configuration. With legacy host routing OR devices: eth0
, it uses these two identities:
16777268 cidr:10.4.2.114/32
reserved:kube-apiserver
reserved:world
16777269 cidr:10.4.1.206/32
reserved:kube-apiserver
reserved:world
With BPF Host Routing AND devices: eth+, it uses these other two:
16777217 cidr:10.4.2.114/32
reserved:kube-apiserver
reserved:world
16777218 cidr:10.4.1.206/32
reserved:kube-apiserver
reserved:world
So, this particular combination of features doesn't work:
- EKS environment (
eni.enabled: true
+ipam.mode: eni
) - BPF Masquerade
- Direct Routing (
tunnel: disabled
) - Only one interface (
devices: eth0
) - BPF Host Routing (selected by default)
This is a change in behavior. The same configuration that worked correctly up until 1.14.0-snapshot.1 stopped working after 1.14.0-snapshot.2.
If this is now expected to be the case and people need to manually switch to legacy Host Routing in this case, we would need to at least document it. But ideally we should be able to do the right thing and not have packets silently dropped.
CC: @julianwiedmann, @aspsk