All traffic inbound to the cluster fails after upgrading from 1.15 to 1.16 (GKE-only)

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Version

equal or higher than v1.16.0 and lower than v1.17.0

### What happened?

After upgrading Cilium from 1.15 to 1.16 on GKE, all externally exposed services became inaccessible. Traffic inside the cluster was not impacted, and Cilium status was all healthy. This included Primitive LoadBalancer Services, Ingresses, and Gateway API.

### How can we reproduce the issue?

1. Install Cilium with Helm on GKE (we're on 1.30). This should be on a Legacy Datapath cluster (not Dataplane V2)
2. Values:
```
agentNotReadyTaintKey: ignore-taint.cluster-autoscaler.kubernetes.io/cilium-agent-not-ready
aksbyocni:
  enabled: false
authentication:
  mutual:
    spire:
      enabled: true
      install:
        enabled: true
        existingNamespace: true
        namespace: kube-system
bpf:
  masquerade: true
cni:
  binPath: /home/kubernetes/bin
devices: eth+
encryption:
  enabled: true
  nodeEncryption: true
  type: wireguard
envoy:
  enabled: true
gatewayAPI:
  enableAlpn: true
  enableAppProtocol: true
  enabled: true
  secretsNamespace:
    create: false
    name: kube-system
hubble:
  listenAddress: :4244
  metrics:
    enabled:
    - dns
    - drop
    - tcp
    - flow
    - icmp
    - http
  relay:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: pool
              operator: In
              values:
              - default
              - control
      podAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              k8s-app: cilium
          topologyKey: kubernetes.io/hostname
    enabled: true
  tls:
    auto:
      method: cronJob
  ui:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: pool
              operator: In
              values:
              - default
              - control
    enabled: true
ingressController:
  enabled: false
ipam:
  mode: kubernetes
k8sServiceHost: REDACTED
k8sServicePort: 443
kubeProxyReplacement: true
l7Proxy: true
loadBalancer:
  serviceTopology: true
localRedirectPolicy: true
nodeinit:
  enabled: true
  reconfigureKubelet: true
  removeCbrBridge: true
operator:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: pool
            operator: In
            values:
            - default
            - control
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            io.cilium/app: operator
        topologyKey: kubernetes.io/hostname
  prometheus:
    enabled: true
prometheus:
  enabled: true
upgradeCompatibility: "1.9"
wellKnownIdentities:
  enabled: true
```
3. Create an nginx deployment and a LoadBalancer in front of it, attempt to curl the LoadBalancer IP.

### Cilium Version

This was reproduced with 1.16.1 and 1.16.3. Once I downgraded to 1.15.9 the problem immediately went away.

### Kernel Version

Linux 6.1.100+ #1 SMP PREEMPT_DYNAMIC Sat Aug 24 16:19:44 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

### Kubernetes Version

v1.30.5-gke.1014003

This was confirmed on two separate GKE clusters with different external services.

### Regression

This is a regression. The exact same config worked on 1.15.9

### Sysdump

I'll grab this later, I had to fix the cluster because of a deadline.

### Relevant log output

```shell
I was not able to find any logs that appeared relevant.
```

### Anything else?

I did find that the external service ports were open (tested with `nc -zv`), but curl resulted in `refused to connect`.

### Cilium Users Document

- [x] Are you a user of Cilium? Please add yourself to the [Users doc](https://github.com/cilium/cilium/blob/main/USERS.md)

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

All traffic inbound to the cluster fails after upgrading from 1.15 to 1.16 (GKE-only) #35977

Is there an existing issue for this?

Version

What happened?

How can we reproduce the issue?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

All traffic inbound to the cluster fails after upgrading from 1.15 to 1.16 (GKE-only) #35977

Description

Is there an existing issue for this?

Version

What happened?

How can we reproduce the issue?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions