Cilium on EKS with Bottlerocket and AWS VPC CNI chaning mode does not work without setting hostLegacyRouting to true

### Is there an existing issue for this?

- [X] I have searched the existing issues

### What happened?

Following these instructions as is, including creating EKS cluster with `eksctl`, but not including SG for pods: https://docs.cilium.io/en/v1.12/gettingstarted/cni-chaining-aws-cni/#chaining-aws-cni the resulting EKS cluster is functional and connectivity tests pass. However if I create the cluster and install Cilium in exactly the same way, but on Bottlerocket AMI, then all probes fail with timeout, other connectivity is broken too. I can see in conntrack and tcpdump that kubelet sends SYN, but there is no reply from the pod. 

There is no packet drop/reject on iptables or in `cilium monitor` logs. I was comparing the two clusters side by side and I found diff in cilium status verbose output. On Amazon Linux 2: `Host Routing: Legacy`, and on Bottlerocket it is `Host Routing: BPF`. Then I created a third cluster (`cilium-test-br-lr`), in the same way as the second but with `bpf.hostLegacyRouting = true`and probes worked and connectivity tests passed.

For the first two clusters I did not set this flag explicitly and it is `false` by default. I assume that Cilium determines in runtime if the system supports BPF and sets the mode accordingly. But it seems that this decision was wrong and something isn't really supported for the BPF host routing to work. That's why I am opening this issue here, but I am happy to continue troubleshooting this to get to the real root cause I just don't now how t proceed. Or is there additional config that I should apply in order to make Bottlerocket cluster work with BPF routing?

Other issues that are related, but not quite the same case or didn't solve this issue:
https://github.com/bottlerocket-os/bottlerocket/issues/1405  - not relevant because node-init is disabled by default and this issue is about minimal repro. The reference cluster (AL2) also ran without node-init.
https://github.com/cilium/cilium/issues/15393  - clone of the same issue. The proposed solution https://github.com/bottlerocket-os/bottlerocket/issues/1405#issuecomment-855085306 is exactly what I am doing (was node-init enabled at that time?)
https://github.com/bottlerocket-os/bottlerocket/issues/1367  - there is indeed difference in rf_filter config between Amazon Linux 2 EKS and Bottlerocket, however setting all rp_filters to 0 and restarting sysctl did not solve the issue, the probes were still failing (I validated `sysctl -a| grep -w rp_filter` had 0 on all the settings).

Reference cluster `cilium-test`:
```
$ cat install-eks-al2.sh
export NAME="cilium-test"
cat <<EOF >eks-config.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: ${NAME}
  region: ap-southeast-2

managedNodeGroups:
- name: ng-1
  desiredCapacity: 2
  privateNetworking: true
  # taint nodes so that application pods are
  # not scheduled/executed until Cilium is deployed.
  # Alternatively, see the note below.
  taints:
   - key: "node.cilium.io/agent-not-ready "
     value: "true"
     effect: "NoExecute"
EOF
eksctl create cluster -f ./eks-config.yaml
```
then install Cilium:
```
helm install cilium cilium/cilium --version 1.12.0 \
  --namespace kube-system \
  --set cni.chainingMode=aws-cni \
  --set enableIPv4Masquerade=false \
  --set tunnel=disabled
```

Broken cluster, Bootlerocket `cilium-test-br`:
```
$ diff ../test-eks/install-eks-al2.sh install-eks-bottlerocket.sh
1c1
< export NAME="cilium-test"
---
> export NAME="cilium-test-br"
13a14,15
>   amiFamily: "Bottlerocket"
>   instanceType: r5.large
```
cilium install - the same.
coredns pods are still pending from cluster creation time and start Running when cilium agent removes the taints. I have also rotated the nodes and restarted pods few times and nothing worked. This is not a coredns issue, it just happens to be the only not hostNetwork workload on the test cluster.


Third cluster, Bottlerocket, with legacy host routing `cilium-test-br-lr`, EKS is deployed in the same way as for `cilium-test-br`, for cilium install added `bpf.hostLegacyRouting=true`
```
helm install cilium cilium/cilium --version 1.12.0 \
  --namespace kube-system \
  --set cni.chainingMode=aws-cni \
  --set enableIPv4Masquerade=false \
  --set bpf.hostLegacyRouting=true \
  --set tunnel=disabled
```

On the first and third cluster all pods are healthy and readiness/liveness probes work, connectivity tests pass (two tests fail, but pass manually, presumably cilium-cli bug there is already similar issue with exactly these 2 test). On the second cluster probes don't work, and curling from pod to another pod ip (on the same or another node) does not work.

on second cluster:
```
$ k get pod -A
NAMESPACE     NAME                               READY   STATUS             RESTARTS        AGE
kube-system   aws-node-47fdz                     1/1     Running            0               26h
kube-system   aws-node-8xjbm                     1/1     Running            0               26h
kube-system   cilium-bcnnh                       1/1     Running            0               26h
kube-system   cilium-gl2jf                       1/1     Running            0               26h
kube-system   cilium-operator-598c495f5f-665d5   1/1     Running            0               26h
kube-system   cilium-operator-598c495f5f-wlpvh   1/1     Running            0               26h
kube-system   coredns-964b95965-969rf            0/1     Running            359 (79s ago)   26h
kube-system   coredns-964b95965-xnsck            0/1     CrashLoopBackOff   107 (35s ago)   7h36m
kube-system   kube-proxy-7mpv8                   1/1     Running            0               26h
kube-system   kube-proxy-bpptv                   1/1     Running            0               26h
```


### Cilium Version

```
$ cilium version
cilium-cli: 0.11.11 compiled with go1.18.3 on darwin/amd64
cilium image (default): v1.11.6
cilium image (stable): v1.12.0
cilium image (running): v1.12.0
```
but happens with 1.11.6 too.

### Kernel Version

Bottlerocket cluster (amazon/bottlerocket-aws-k8s-1.22-x86_64-v1.8.0-a6233c22):
```
$ uname -a
Linux ip-192-168-106-102.ap-southeast-2.compute.internal 5.10.118 #1 SMP Thu Jun 9 01:24:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
```
Reference cluster (amazon/amazon-eks-node-1.22-v20220629)
```
$ uname -a
Linux ip-192-168-112-230.ap-southeast-2.compute.internal 5.4.196-108.356.amzn2.x86_64 #1 SMP Thu May 26 12:49:47 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
```


### Kubernetes Version

```
k version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.24.1
Kustomize Version: v4.5.4
Server Version: v1.22.10-eks-84b4fe6
WARNING: version difference between client (1.24) and server (1.22) exceeds the supported minor version skew of +/-1
```

### Sysdump

[cilium-sysdump-20220728-175804.zip](https://github.com/cilium/cilium/files/9206930/cilium-sysdump-20220728-175804.zip)


### Relevant log output

```shell
# Note that these IPs can be different from the ones in sysdump because I rotated pods/nodes few times. 
# Probes from kubelet to a pod remain in UNREPLIED
bash-5.1# conntrack -E -d 192.168.149.81
    [NEW] tcp      6 120 SYN_SENT src=192.168.154.54 dst=192.168.149.81 sport=44354 dport=8080 [UNREPLIED] src=192.168.149.81 dst=192.168.154.54 sport=8080 dport=44354
    [NEW] tcp      6 120 SYN_SENT src=192.168.154.54 dst=192.168.149.81 sport=44356 dport=8080 [UNREPLIED] src=192.168.149.81 dst=192.168.154.54 sport=8080 dport=44356
    [NEW] tcp      6 120 SYN_SENT src=192.168.154.54 dst=192.168.149.81 sport=57922 dport=8080 [UNREPLIED] src=192.168.149.81 dst=192.168.154.54 sport=8080 dport=57922


## tcpdump example (host: `192.168.106.102`, pod: `192.168.108.32`): 

03:42:43.607849 IP 192.168.106.102.39394 > 192.168.108.32.8080: Flags [S], seq 1650401920, win 62727, options [mss 8961,sackOK,TS val 2146721601 ecr 0,nop,wscale 7], length 0
03:42:43.607849 IP 192.168.106.102.39396 > 192.168.108.32.8080: Flags [S], seq 3000395580, win 62727, options [mss 8961,sackOK,TS val 2146721601 ecr 0,nop,wscale 7], length 0
03:42:43.607878 IP 192.168.108.32.8080 > 192.168.106.102.39396: Flags [S.], seq 183656348, ack 3000395581, win 62643, options [mss 8961,sackOK,TS val 2382368077 ecr 2146721601,nop,wscale 7], length 0
03:42:43.607878 IP 192.168.108.32.8080 > 192.168.106.102.39394: Flags [S.], seq 1269762911, ack 1650401921, win 62643, options [mss 8961,sackOK,TS val 2382368077 ecr 2146721601,nop,wscale 7], length 0
03:42:43.607890 IP 192.168.108.32.8080 > 192.168.106.102.39396: Flags [S.], seq 183656348, ack 3000395581, win 62643, options [mss 8961,sackOK,TS val 2382368077 ecr 2146721601,nop,wscale 7], length 0
03:42:43.607890 IP 192.168.108.32.8080 > 192.168.106.102.39394: Flags [S.], seq 1269762911, ack 1650401921, win 62643, options [mss 8961,sackOK,TS val 2382368077 ecr 2146721601,nop,wscale 7], length 0
03:42:44.616940 IP 192.168.108.32.8080 > 192.168.106.102.39394: Flags [S.], seq 1269762911, ack 1650401921, win 62643, options [mss 8961,sackOK,TS val 2382369086 ecr 2146721601,nop,wscale 7], length 0
03:42:44.616951 IP 192.168.106.102.39396 > 192.168.108.32.8080: Flags [S], seq 3000395580, win 62727, options [mss 8961,sackOK,TS val 2146722610 ecr 0,nop,wscale 7], length 0
03:42:44.616959 IP 192.168.108.32.8080 > 192.168.106.102.39394: Flags [S.], seq 1269762911, ack 1650401921, win 62643, options [mss 8961,sackOK,TS val 2382369086 ecr 2146721601,nop,wscale 7], length 0
03:42:44.616983 IP 192.168.108.32.8080 > 192.168.106.102.39396: Flags [S.], seq 183656348, ack 3000395581, win 62643, options [mss 8961,sackOK,TS val 2382369086 ecr 2146721601,nop,wscale 7], length 0
03:42:44.616986 IP 192.168.108.32.8080 > 192.168.106.102.39396: Flags [S.], seq 183656348, ack 3000395581, win 62643, options [mss 8961,sackOK,TS val 2382369086 ecr 2146721601,nop,wscale 7], length 0
03:42:44.616988 IP 192.168.108.32.8080 > 192.168.106.102.39396: Flags [S.], seq 183656348, ack 3000395581, win 62643, options [mss 8961,sackOK,TS val 2382369086 ecr 2146721601,nop,wscale 7], length 0
03:42:44.616991 IP 192.168.108.32.8080 > 192.168.106.102.39396: Flags [S.], seq 183656348, ack 3000395581, win 62643, options [mss 8961,sackOK,TS val 2382369086 ecr 2146721601,nop,wscale 7], length 0
```



### Anything else?

_No response_

### Code of Conduct

- [X] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cilium on EKS with Bottlerocket and AWS VPC CNI chaning mode does not work without setting hostLegacyRouting to true #20677

Is there an existing issue for this?

What happened?

Cilium Version

Kernel Version

Kubernetes Version

Sysdump

Relevant log output

Anything else?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cilium on EKS with Bottlerocket and AWS VPC CNI chaning mode does not work without setting hostLegacyRouting to true #20677

Description

Is there an existing issue for this?

What happened?

Cilium Version

Kernel Version

Kubernetes Version

Sysdump

Relevant log output

Anything else?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions