Cilium 1.16 identifies reply to outgoing traffic that leaves the cluster as new connection

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Version

equal or higher than v1.16.0 and lower than v1.17.0

### What happened?

Cilium 1.16.3 (hostfirewall enabled, encapsulated mode (geneve), with kube-proxy) identifies the reply to an outgoing connection that leaves the cluster as a new connection. This makes an extra (stale) entry in the `ct map` with every single connection leaving the cluster and the reply would be dropped by the firewall (we use default-deny). We reproduced the problem in a kind cluster so it can be simpler to debug.

Here are the cilium flowlogs for a single curl request made from a pod (note the 3rd line where Cilium identifies the reply with SYN, ACK as new connection):
```
# hubble observe flows -f |grep '79.172.255.103'
Oct 25 07:39:55.531: default/curly:36642 (ID:60372) -> 79.172.255.103:80 (world) policy-verdict:none EGRESS AUDITED (TCP Flags: SYN)
Oct 25 07:39:55.531: default/curly:36642 (ID:60372) -> 79.172.255.103:80 (world) to-stack FORWARDED (TCP Flags: SYN)
Oct 25 07:39:55.571: 79.172.255.103:80 (world) -> 172.18.0.3:36642 (host) policy-verdict:none INGRESS AUDITED (TCP Flags: SYN, ACK)
Oct 25 07:39:55.571: default/curly:36642 (ID:60372) <- 79.172.255.103:80 (world) to-endpoint FORWARDED (TCP Flags: SYN, ACK)
Oct 25 07:39:55.571: default/curly:36642 (ID:60372) -> 79.172.255.103:80 (world) to-stack FORWARDED (TCP Flags: ACK)
Oct 25 07:39:55.571: default/curly:36642 (ID:60372) -> 79.172.255.103:80 (world) to-stack FORWARDED (TCP Flags: ACK, PSH)
Oct 25 07:39:55.612: default/curly:36642 (ID:60372) <- 79.172.255.103:80 (world) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Oct 25 07:39:55.612: default/curly:36642 (ID:60372) -> 79.172.255.103:80 (world) to-stack FORWARDED (TCP Flags: ACK, FIN)
Oct 25 07:39:55.652: default/curly:36642 (ID:60372) <- 79.172.255.103:80 (world) to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Oct 25 07:39:55.652: default/curly:36642 (ID:60372) -> 79.172.255.103:80 (world) to-stack FORWARDED (TCP Flags: ACK)
```

relevant `ct map` entries after few tests (note the two "TCP IN" which are actually a reply to our traffic):
```
# cilium-dbg bpf ct list global|grep 79.172.255.103
TCP IN 79.172.255.103:80 -> 172.18.0.3:36642 expires=9143 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1143 TxFlagsSeen=0x00 LastTxReport=0 Flags=0x0011 [ RxClosing SeenNonSyn ] RevNAT=0 SourceSecurityID=2 IfIndex=0 BackendID=0 
TCP IN 79.172.255.103:80 -> 172.18.0.3:55530 expires=9223 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1223 TxFlagsSeen=0x00 LastTxReport=0 Flags=0x0011 [ RxClosing SeenNonSyn ] RevNAT=0 SourceSecurityID=2 IfIndex=0 BackendID=0 
TCP OUT 10.244.1.156:50708 -> 79.172.255.103:80 expires=1948 Packets=0 Bytes=0 RxFlagsSeen=0x1b LastRxReport=1938 TxFlagsSeen=0x1b LastTxReport=1938 Flags=0x0013 [ RxClosing TxClosing SeenNonSyn ] RevNAT=0 SourceSecurityID=60372 IfIndex=0 BackendID=0 
```

Cilium config:
```
apiVersion: v1
data:
  agent-not-ready-taint-key: node.cilium.io/agent-not-ready
  arping-refresh-period: 30s
  auto-direct-node-routes: "false"
  bpf-events-drop-enabled: "true"
  bpf-events-policy-verdict-enabled: "true"
  bpf-events-trace-enabled: "true"
  bpf-lb-acceleration: disabled
  bpf-lb-external-clusterip: "false"
  bpf-lb-map-max: "65536"
  bpf-lb-sock: "false"
  bpf-lb-sock-terminate-pod-connections: "false"
  bpf-map-dynamic-size-ratio: "0.0025"
  bpf-policy-map-max: "16384"
  bpf-root: /sys/fs/bpf
  cgroup-root: /run/cilium/cgroupv2
  cilium-endpoint-gc-interval: 5m0s
  cluster-id: "0"
  cluster-name: kind-kind
  clustermesh-enable-endpoint-sync: "false"
  clustermesh-enable-mcs-api: "false"
  cni-chaining-mode: portmap
  cni-exclusive: "false"
  cni-log-file: /var/run/cilium/cilium-cni.log
  custom-cni-conf: "false"
  datapath-mode: veth
  debug: "false"
  debug-verbose: ""
  direct-routing-skip-unreachable: "false"
  dnsproxy-socket-linger-timeout: "10"
  egress-gateway-reconciliation-trigger-interval: 1s
  enable-auto-protect-node-port-range: "true"
  enable-bpf-clock-probe: "false"
  enable-endpoint-health-checking: "true"
  enable-external-ips: "false"
  enable-health-check-loadbalancer-ip: "false"
  enable-health-check-nodeport: "true"
  enable-health-checking: "true"
  enable-host-firewall: "true"
  enable-host-legacy-routing: "true"
  enable-host-port: "false"
  enable-hubble: "true"
  enable-ipv4: "true"
  enable-ipv4-big-tcp: "false"
  enable-ipv4-masquerade: "true"
  enable-ipv6: "false"
  enable-ipv6-big-tcp: "false"
  enable-ipv6-masquerade: "true"
  enable-k8s-networkpolicy: "true"
  enable-k8s-terminating-endpoint: "true"
  enable-l2-neigh-discovery: "true"
  enable-l7-proxy: "true"
  enable-local-redirect-policy: "false"
  enable-masquerade-to-route-source: "false"
  enable-metrics: "true"
  enable-node-port: "false"
  enable-node-selector-labels: "false"
  enable-policy: always
  enable-runtime-device-detection: "true"
  enable-sctp: "false"
  enable-svc-source-range-check: "true"
  enable-tcx: "true"
  enable-vtep: "false"
  enable-well-known-identities: "false"
  enable-xt-socket-fallback: "true"
  envoy-base-id: "0"
  envoy-keep-cap-netbindservice: "false"
  external-envoy-proxy: "true"
  hubble-disable-tls: "false"
  hubble-export-file-max-backups: "5"
  hubble-export-file-max-size-mb: "10"
  hubble-listen-address: :4244
  hubble-socket-path: /var/run/cilium/hubble.sock
  hubble-tls-cert-file: /var/lib/cilium/tls/hubble/server.crt
  hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt
  hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key
  identity-allocation-mode: crd
  identity-gc-interval: 15m0s
  identity-heartbeat-timeout: 30m0s
  install-no-conntrack-iptables-rules: "false"
  ipam: kubernetes
  ipam-cilium-node-update-rate: 15s
  k8s-client-burst: "20"
  k8s-client-qps: "10"
  k8s-require-ipv4-pod-cidr: "false"
  k8s-require-ipv6-pod-cidr: "false"
  kube-proxy-replacement: "false"
  kube-proxy-replacement-healthz-bind-address: ""
  max-connected-clusters: "255"
  mesh-auth-enabled: "true"
  mesh-auth-gc-interval: 5m0s
  mesh-auth-queue-size: "1024"
  mesh-auth-rotated-identities-queue-size: "1024"
  monitor-aggregation: medium
  monitor-aggregation-flags: all
  monitor-aggregation-interval: 5s
  nat-map-stats-entries: "32"
  nat-map-stats-interval: 30s
  node-port-bind-protection: "true"
  nodeport-addresses: ""
  nodes-gc-interval: 5m0s
  operator-api-serve-addr: 127.0.0.1:9234
  operator-prometheus-serve-addr: :9963
  policy-audit-mode: "true"
  policy-cidr-match-mode: ""
  preallocate-bpf-maps: "false"
  procfs: /host/proc
  proxy-connect-timeout: "2"
  proxy-idle-timeout-seconds: "60"
  proxy-max-connection-duration-seconds: "0"
  proxy-max-requests-per-connection: "0"
  proxy-xff-num-trusted-hops-egress: "0"
  proxy-xff-num-trusted-hops-ingress: "0"
  remove-cilium-node-taints: "true"
  routing-mode: tunnel
  service-no-backend-response: reject
  set-cilium-is-up-condition: "true"
  set-cilium-node-taints: "true"
  synchronize-k8s-nodes: "true"
  tofqdns-dns-reject-response-code: refused
  tofqdns-enable-dns-compression: "true"
  tofqdns-endpoint-max-ip-per-hostname: "50"
  tofqdns-idle-connection-grace-period: 0s
  tofqdns-max-deferred-connection-deletes: "10000"
  tofqdns-proxy-response-max-delay: 100ms
  tunnel-protocol: geneve
  unmanaged-pod-watcher-interval: "15"
  vtep-cidr: ""
  vtep-endpoint: ""
  vtep-mac: ""
  vtep-mask: ""
  write-cni-conf-when-ready: /host/etc/cni/net.d/05-cilium.conflist
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: cilium
    meta.helm.sh/release-namespace: kube-system
  creationTimestamp: "2024-10-25T07:34:45Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: cilium-config
  namespace: kube-system
  resourceVersion: "804"
  uid: d4585746-8c01-4c35-ab17-12b212ccb332
```

### How can we reproduce the issue?

1) create kind cluster:
```
cat <<EOT >> kind-config.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  disableDefaultCNI: true        # do not install kindnet
nodes:
- role: control-plane
- role: worker
EOT
kind create cluster --config ./kind-config.yaml --image kindest/node:v1.30.4
```

2) install cilium
```
cilium install --version=v1.16.3 --helm-set cni.exclusive=false --helm-set ipam.mode=kubernetes --helm-set identityAllocationMode=crd --helm-set tunnelProtocol=geneve --helm-set cni.chainingMode=portmap --helm-set hostFirewall.enabled=true --helm-set operator.replicas=2 --helm-set policyAuditMode=true --helm-set policyEnforcementMode=always --helm-set hubble.enabled=true
cilium status --wait
```

3) enable hubble
```
cilium hubble enable
cilium status --wait
```

4) run a pod
```
kubectl run -it --rm --image=curlimages/curl curly -- /bin/sh
```

5) call a curl command and observe the hubble flow logs from another terminal
```
hubble observe flows -f |grep 'IPWHERETHECONNECTIONGOES'
```


### Cilium Version

```
# cilium version
cilium-cli: v0.16.19 compiled with go1.23.1 on linux/amd64
cilium image (default): v1.16.2
cilium image (stable): v1.16.3
cilium image (running): 1.16.3
```

### Kernel Version

```
# uname -a
Linux ip-10-100-0-143 6.8.0-1017-aws #18~22.04.1-Ubuntu SMP Thu Oct  3 19:57:42 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
```

### Kubernetes Version

```
# kubectl version
Client Version: v1.31.2
Kustomize Version: v5.4.2
Server Version: v1.30.4
```

### Regression
Yes, the issue cannot be reproduced with 1.15.10

### Sysdump

[cilium-sysdump-20241025-075558.zip](https://github.com/user-attachments/files/17518337/cilium-sysdump-20241025-075558.zip)


### Relevant log output

_No response_

### Anything else?

This ticket obsoletes the [old one](https://github.com/cilium/cilium/issues/35056) that I opened as it contains all information in one place using a kind cluster.

### Cilium Users Document

- [x] Are you a user of Cilium? Please add yourself to the [Users doc](https://github.com/cilium/cilium/blob/main/USERS.md)

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cilium 1.16 identifies reply to outgoing traffic that leaves the cluster as new connection #35535

Is there an existing issue for this?

Version

What happened?

How can we reproduce the issue?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cilium 1.16 identifies reply to outgoing traffic that leaves the cluster as new connection #35535

Description

Is there an existing issue for this?

Version

What happened?

How can we reproduce the issue?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions