Cilium 1.16.5 makes link-local address unreachable from pod's network

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Version

equal or higher than v1.16.4 and lower than v1.17.0

### What happened?

After upgrading cilium to 1.16.5 with `bpf.masquerade=true`, all communication to node-local-dns were timeout.

```
❯ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached


command terminated with exit code 1
```

The node-local-dns is configured by kubespray. It listens link-local address (`169.254.25.10`), and its address is written in `resolv.conf` in Pods as nameserver.

```
❯ kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local flets-east.jp iptvf.jp
nameserver 169.254.25.10
options ndots:5

❯ ssh ubuntu@$WORKER_IPADDR ip a show dev nodelocaldns
3: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 6e:a5:52:ac:8a:b2 brd ff:ff:ff:ff:ff:ff
    inet 169.254.25.10/32 scope global nodelocaldns
       valid_lft forever preferred_lft forever
```

Looks like a similar problem to #35153 , but it worked up to 1.16.4 as far as I have tried in my reproduction environment even if `bpf.masquerade=true`.

### How can we reproduce the issue?

Setup VMs to setup kubernetes cluster by kubespray. Multipass is used here. But it can be other tool.

```bash
# Useful for ssh
GITHUB_USER=your GitHub user name to import public keys

cat << EOF > cloud-config.yaml
allow_public_ssh_keys: true
ssh_import_id: ["gh:${GITHUB_USER}"]
EOF

multipass launch \
  --name k8s-master \
  --memory 2G \
  --disk 30G \
  --cloud-init cloud-config.yaml \
  24.04

multipass launch \
  --name k8s-worker \
  --memory 2G \
  --disk 30G \
  --cloud-init cloud-config.yaml \
  24.04
```

Setup cluster by kubespray with node-local-dns.

```bash
mkdir -p inventory/multipass
MASTER_IPADDR=$(multipass info --format json| jq -r '.info | to_entries[] | select(.key == "k8s-master") | .value.ipv4[]')
WORKER_IPADDR=$(multipass info --format json| jq -r '.info | to_entries[] | select(.key == "k8s-worker") | .value.ipv4[]')

cat << EOF > inventory/multipass/hosts.yaml
all:
  hosts:
    k8s-master:
      ansible_user: ubuntu
      ansible_host: ${MASTER_IPADDR}
      ip: ${MASTER_IPADDR}
      access_ip: ${MASTER_IPADDR}
    k8s-worker:
      ansible_user: ubuntu
      ansible_host: ${WORKER_IPADDR}
      ip: ${WORKER_IPADDR}
      access_ip: ${WORKER_IPADDR}
  children:
    kube_control_plane:
      hosts:
        k8s-master:
    kube_node:
      hosts:
        k8s-master:
        k8s-worker:
    etcd:
      hosts:
        k8s-master:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
EOF

# Check connectivity
docker run --rm  -it \
  --mount type=tmpfs,dst=/.ansible \
  --mount type=bind,source=$(pwd)/inventory,dst=/kubeproxy/inventory \
  --mount type=bind,source=${HOME}/.ssh/id_ed25519,dst=/root/.ssh/id_rsa \
  quay.io/kubespray/kubespray:v2.26.0 \
  ansible -i /kubeproxy/inventory/multipass/hosts.yaml -m ping all


mkdir -p inventory/multipass/group_vars
cat << EOF > inventory/multipass/group_vars/all.yaml
---
kube_version: v1.30.4
container_manager: containerd

# cilium is installed by helm later.
kube_network_plugin: cni
kube_proxy_remove: true

kubeconfig_localhost: true

dns_mode: coredns

# Enable node-local-dns
enable_nodelocaldns: true
# Default address of node-local-dns
nodelocaldns_ip: 169.254.25.10
EOF

# Setup cluster. This may take several minutes.
docker run --rm \
  --mount type=tmpfs,dst=/.ansible \
  --mount type=bind,source=$(pwd)/inventory,dst=/kubeproxy/inventory \
  --mount type=bind,source=${HOME}/.ssh/id_ed25519,dst=/root/.ssh/id_rsa \
  quay.io/kubespray/kubespray:v2.26.0 \
  ansible-playbook -i /kubeproxy/inventory/multipass/hosts.yaml --become cluster.yml

# Setup KUBECONFIG
export KUBECONFIG=$KUBECONFIG:$(pwd)/inventory/multipass/artifacts/admin.conf
```

Install cilium by helm

```bash
cat << EOF > cilium-values.yaml
k8sServiceHost: ${MASTER_IPADDR}
k8sServicePort: 6443
kubeProxyReplacement: true
ipam:
  mode: kubernetes
securityContext:
  privileged: true
# Required
bpf:
  masquerade: true
EOF

helm install cilium cilium/cilium \
  --version 1.16.5 \
  --values cilium-values.yaml \
  --namespace kube-system

kubectl wait --timeout=90s --for=condition=Ready -n kube-system \
  pods -l k8s-app=cilium

# Recreate some containers
kubectl get pods \
  --all-namespaces \
  -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,HOSTNETWORK:.spec.hostNetwork \
  --no-headers=true \
  | grep '<none>' \
  | awk '{print "-n "$1" "$2}' \
  | xargs -L 1 -r kubectl delete pod
```

Deploy pod contains dns utilities.

```bash
cat << EOF > dnsutils.yaml
apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: registry.k8s.io/e2e-test-images/agnhost:2.39
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
EOF
kubectl apply -f dnsutils.yaml
```

nslookup will be timeout.

```bash
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
```

nslookup using CoreDNS ClusterIP should succeeds.

```bash
kubectl exec -i -t dnsutils -- nslookup kubernetes.default $(kubectl get svc coredns -n kube-system -o jsonpath={.spec.clusterIP})
```

### Cilium Version

```
❯ kubectl -n kube-system exec ds/cilium -- cilium version
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), clean-cilium-state (init), install-cni-binaries (init)
Client: 1.16.5 ad688277 2024-12-12T20:18:31+00:00 go version go1.22.10 linux/arm64
Daemon: 1.16.5 ad688277 2024-12-12T20:18:31+00:00 go version go1.22.10 linux/arm64
```

It also occurred on amd64.

### Kernel Version

```
Linux k8s-worker 6.8.0-49-generic #49-Ubuntu SMP PREEMPT_DYNAMIC Sun Nov  3 21:21:58 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
```

### Kubernetes Version

```
Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.30.4
```

### Regression

o: works well
x: not works

1.15.11: o
1.15.12: o
1.16.0: o
1.16.4: o
1.16.5: x

### Sysdump

_No response_

### Relevant log output

```shell

```

### Anything else?

`ipMasqAgent` enabled with `masqLinkLocal: false` did not help.

Result of `hubble observe` is as follows.
When it works:

```
❯ kubectl exec -ti pod/$(./k8s-get-cilium-pod.sh dnsutils default) -n kube-system -c cilium-agent \
  -- hubble observe --since 1m --pod default/dnsutils
Dec 22 07:51:41.852: 127.0.0.1:39065 (world) <> default/dnsutils (ID:32060) pre-xlate-rev TRACED (UDP)
Dec 22 07:51:41.852: default/dnsutils:50642 (ID:32060) -> 169.254.25.10:53 (world) to-stack FORWARDED (UDP)
Dec 22 07:51:41.853: default/dnsutils:50642 (ID:32060) <- 169.254.25.10:53 (world) to-endpoint FORWARDED (UDP)
Dec 22 07:51:41.853: 169.254.25.10:53 (world) <> default/dnsutils (ID:32060) pre-xlate-rev TRACED (UDP)
Dec 22 07:51:41.853: default/dnsutils:60221 (ID:32060) -> 169.254.25.10:53 (world) to-stack FORWARDED (UDP)
Dec 22 07:51:41.853: default/dnsutils:60221 (ID:32060) <- 169.254.25.10:53 (world) to-endpoint FORWARDED (UDP)
Dec 22 07:51:41.853: 169.254.25.10:53 (world) <> default/dnsutils (ID:32060) pre-xlate-rev TRACED (UDP)
Dec 22 07:51:41.855: default/dnsutils:51283 (ID:32060) -> 169.254.25.10:53 (world) to-stack FORWARDED (UDP)
Dec 22 07:51:41.856: default/dnsutils:51283 (ID:32060) <- 169.254.25.10:53 (world) to-endpoint FORWARDED (UDP)
Dec 22 07:51:41.856: 169.254.25.10:53 (world) <> default/dnsutils (ID:32060) pre-xlate-rev TRACED (UDP)
```

When it does not work:

```
❯ kubectl exec -ti pod/$(./k8s-get-cilium-pod.sh dnsutils default) -n kube-system -c cilium-agent \
  -- hubble observe --since 1m --pod default/dnsutils
Dec 22 07:45:18.830: 127.0.0.1:45439 (world) <> default/dnsutils (ID:32060) pre-xlate-rev TRACED (UDP)
Dec 22 07:45:18.830: default/dnsutils:55914 (ID:32060) -> 169.254.25.10:53 (world) to-network FORWARDED (UDP)
Dec 22 07:45:28.849: default/dnsutils:55914 (ID:32060) -> 169.254.25.10:53 (world) to-network FORWARDED (UDP)
```

### Cilium Users Document

- [ ] Are you a user of Cilium? Please add yourself to the [Users doc](https://github.com/cilium/cilium/blob/main/USERS.md)

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cilium 1.16.5 makes link-local address unreachable from pod's network #36761

Is there an existing issue for this?

Version

What happened?

How can we reproduce the issue?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cilium 1.16.5 makes link-local address unreachable from pod's network #36761

Description

Is there an existing issue for this?

Version

What happened?

How can we reproduce the issue?

Cilium Version

Kernel Version

Kubernetes Version

Regression

Sysdump

Relevant log output

Anything else?

Cilium Users Document

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions