Broken connectivity when using BFP masquerade, no tunnel and BPF Host Routing on EKS with 1 device

I tracked down this issue to a change that happened between v1.14.0-snapshot.1 and v1.14.0-snapshot.2

On an EKS cluster, packets that are supposed to go to the EKS kube-apiserver get silently dropped when this Helm config is applied:

```
bpf:
  masquerade: true
devices: eth0
eni:
  enabled: true
ipam:
  mode: eni
kubeProxyReplacement: strict
tunnel: disabled
```

Disabling BPF masquerading, using `eth+` on the devices or setting `vxlan` for the tunnel, all lead to a working communication with the kube-apiserver. KRP is unrelated, but it needs to be set to some value when using the 1.14.0 chart due to the default value changing to false.

Same Helm configuration, but as a command (for easier testing):
```
helm upgrade -n kube-system cilium cilium/cilium --version 1.14.0  \
  --set image.tag=v1.14.0-snapshot.2 --set image.useDigest=false \
  --set bpf.masquerade=true \
  --set devices=eth0 \
  --set tunnel=disabled \
  --set eni.enabled=true \
  --set ipam.mode=eni \
  --set kubeProxyReplacement=strict
```

When setting `devices: eth+`, I can see with `cilium monitor` that the packet is now being routed through `eth1`. 

With 1.13.5, the packet goes and comes back over `ifindex 0`.
```
Policy verdict log: flow 0x93b86448 local EP ID 3678, remote ID 16777218, proto 6, egress, action allow, match L3-L4, 10.4.201.117:40748 -> 10.4.1.206:443 tcp SYN
-> stack flow 0x93b86448 , identity 35566->16777218 state new ifindex 0 orig-ip 0.0.0.0: 10.4.201.117:40748 -> 10.4.1.206:443 tcp SYN
-> endpoint 3678 flow 0xe857e857 , identity 16777218->35566 state reply ifindex 0 orig-ip 10.4.1.206: 10.4.1.206:443 -> 10.4.201.117:40748 tcp SYN, ACK
-> stack flow 0x93b86448 , identity 35566->16777218 state established ifindex 0 orig-ip 0.0.0.0: 10.4.201.117:40748 -> 10.4.1.206:443 tcp ACK
```

With 1.14.0-snapshot.2 & eth0, it sends the packet on eth0 but the syn,ack never gets back:
```
Policy verdict log: flow 0xcf0da62e local EP ID 3678, remote ID 16777268, proto 6, egress, action allow, match L4-Only, 10.4.201.117:51344 -> 10.4.1.206:443 tcp SYN
-> network flow 0xcf0da62e , identity 35566->16777268 state new ifindex eth0 orig-ip 0.0.0.0: 10.4.201.117:51344 -> 10.4.1.206:443 tcp SYN
```

With 1.14.0-snapshot.2 & eth+, the packet gets sent on eth1 and we get the syn,ack back:
```
Policy verdict log: flow 0x2d05b7a4 local EP ID 3678, remote ID 16777217, proto 6, egress, action allow, match L4-Only, 10.4.201.117:35980 -> 10.4.1.206:443 tcp SYN
-> network flow 0x2d05b7a4 , identity 35566->16777217 state new ifindex eth1 orig-ip 0.0.0.0: 10.4.201.117:35980 -> 10.4.1.206:443 tcp SYN
-> endpoint 3678 flow 0xa6c8a6c8 , identity 16777217->35566 state reply ifindex lxc6bedd6da59c0 orig-ip 10.4.1.206: 10.4.1.206:443 -> 10.4.201.117:35980 tcp SYN, ACK
-> network flow 0x2d05b7a4 , identity 35566->16777217 state established ifindex eth1 orig-ip 0.0.0.0: 10.4.201.117:35980 -> 10.4.1.206:443 tcp ACK
```

When discussing this with Julian, he made me realize that https://github.com/cilium/cilium/pull/22006 (merged during the mentioned window) made changes around bpf host routing. So I tried switching to legacy host routing instead of BPF Host Routing, and connectivity was restored.

With 1.14.0-snapshot.2, `devices=eth0` and `bpf.hostLegacyRouting=true`. Again, syn and syn,ack on `ifindex 0`:
```
Policy verdict log: flow 0x2050dec6 local EP ID 3678, remote ID 16777268, proto 6, egress, action allow, match L4-Only, 10.4.201.117:34694 -> 10.4.1.206:443 tcp SYN
-> stack flow 0x2050dec6 , identity 35566->16777268 state new ifindex 0 orig-ip 0.0.0.0: 10.4.201.117:34694 -> 10.4.1.206:443 tcp SYN
-> endpoint 3678 flow 0xb495b495 , identity 16777268->35566 state reply ifindex 0 orig-ip 10.4.1.206: 10.4.1.206:443 -> 10.4.201.117:34694 tcp SYN, ACK
-> stack flow 0x2050dec6 , identity 35566->16777268 state established ifindex 0 orig-ip 0.0.0.0: 10.4.201.117:34694 -> 10.4.1.206:443 tcp ACK
```

I'm not sure if it means anything, but it's interesting that the Cilium identities for the kube-apiserver change depending on the configuration. With legacy host routing OR `devices: eth0`, it uses these two identities:
```
16777268   cidr:10.4.2.114/32
           reserved:kube-apiserver
           reserved:world
16777269   cidr:10.4.1.206/32
           reserved:kube-apiserver
           reserved:world
```

With BPF Host Routing AND devices: eth+, it uses these other two:
```
16777217   cidr:10.4.2.114/32
           reserved:kube-apiserver
           reserved:world
16777218   cidr:10.4.1.206/32
           reserved:kube-apiserver
           reserved:world
```

So, this particular combination of features doesn't work:
- EKS environment (`eni.enabled: true` + `ipam.mode: eni`)
- BPF Masquerade
- Direct Routing (`tunnel: disabled`)
- Only one interface (`devices: eth0`)
- BPF Host Routing (selected by default)

This is a change in behavior. The same configuration that worked correctly up until 1.14.0-snapshot.1 stopped working after 1.14.0-snapshot.2.

If this is now expected to be the case and people need to manually switch to legacy Host Routing in this case, we would need to at least document it. But ideally we should be able to do the right thing and not have packets silently dropped.

CC: @julianwiedmann, @aspsk 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Broken connectivity when using BFP masquerade, no tunnel and BPF Host Routing on EKS with 1 device #27343

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Broken connectivity when using BFP masquerade, no tunnel and BPF Host Routing on EKS with 1 device #27343

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions