Skip to content

Cilium 1.16.5 makes link-local address unreachable from pod's network #36761

@pddg

Description

@pddg

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.16.4 and lower than v1.17.0

What happened?

After upgrading cilium to 1.16.5 with bpf.masquerade=true, all communication to node-local-dns were timeout.

❯ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached


command terminated with exit code 1

The node-local-dns is configured by kubespray. It listens link-local address (169.254.25.10), and its address is written in resolv.conf in Pods as nameserver.

❯ kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local flets-east.jp iptvf.jp
nameserver 169.254.25.10
options ndots:5

❯ ssh ubuntu@$WORKER_IPADDR ip a show dev nodelocaldns
3: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
    link/ether 6e:a5:52:ac:8a:b2 brd ff:ff:ff:ff:ff:ff
    inet 169.254.25.10/32 scope global nodelocaldns
       valid_lft forever preferred_lft forever

Looks like a similar problem to #35153 , but it worked up to 1.16.4 as far as I have tried in my reproduction environment even if bpf.masquerade=true.

How can we reproduce the issue?

Setup VMs to setup kubernetes cluster by kubespray. Multipass is used here. But it can be other tool.

# Useful for ssh
GITHUB_USER=your GitHub user name to import public keys

cat << EOF > cloud-config.yaml
allow_public_ssh_keys: true
ssh_import_id: ["gh:${GITHUB_USER}"]
EOF

multipass launch \
  --name k8s-master \
  --memory 2G \
  --disk 30G \
  --cloud-init cloud-config.yaml \
  24.04

multipass launch \
  --name k8s-worker \
  --memory 2G \
  --disk 30G \
  --cloud-init cloud-config.yaml \
  24.04

Setup cluster by kubespray with node-local-dns.

mkdir -p inventory/multipass
MASTER_IPADDR=$(multipass info --format json| jq -r '.info | to_entries[] | select(.key == "k8s-master") | .value.ipv4[]')
WORKER_IPADDR=$(multipass info --format json| jq -r '.info | to_entries[] | select(.key == "k8s-worker") | .value.ipv4[]')

cat << EOF > inventory/multipass/hosts.yaml
all:
  hosts:
    k8s-master:
      ansible_user: ubuntu
      ansible_host: ${MASTER_IPADDR}
      ip: ${MASTER_IPADDR}
      access_ip: ${MASTER_IPADDR}
    k8s-worker:
      ansible_user: ubuntu
      ansible_host: ${WORKER_IPADDR}
      ip: ${WORKER_IPADDR}
      access_ip: ${WORKER_IPADDR}
  children:
    kube_control_plane:
      hosts:
        k8s-master:
    kube_node:
      hosts:
        k8s-master:
        k8s-worker:
    etcd:
      hosts:
        k8s-master:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:
EOF

# Check connectivity
docker run --rm  -it \
  --mount type=tmpfs,dst=/.ansible \
  --mount type=bind,source=$(pwd)/inventory,dst=/kubeproxy/inventory \
  --mount type=bind,source=${HOME}/.ssh/id_ed25519,dst=/root/.ssh/id_rsa \
  quay.io/kubespray/kubespray:v2.26.0 \
  ansible -i /kubeproxy/inventory/multipass/hosts.yaml -m ping all


mkdir -p inventory/multipass/group_vars
cat << EOF > inventory/multipass/group_vars/all.yaml
---
kube_version: v1.30.4
container_manager: containerd

# cilium is installed by helm later.
kube_network_plugin: cni
kube_proxy_remove: true

kubeconfig_localhost: true

dns_mode: coredns

# Enable node-local-dns
enable_nodelocaldns: true
# Default address of node-local-dns
nodelocaldns_ip: 169.254.25.10
EOF

# Setup cluster. This may take several minutes.
docker run --rm \
  --mount type=tmpfs,dst=/.ansible \
  --mount type=bind,source=$(pwd)/inventory,dst=/kubeproxy/inventory \
  --mount type=bind,source=${HOME}/.ssh/id_ed25519,dst=/root/.ssh/id_rsa \
  quay.io/kubespray/kubespray:v2.26.0 \
  ansible-playbook -i /kubeproxy/inventory/multipass/hosts.yaml --become cluster.yml

# Setup KUBECONFIG
export KUBECONFIG=$KUBECONFIG:$(pwd)/inventory/multipass/artifacts/admin.conf

Install cilium by helm

cat << EOF > cilium-values.yaml
k8sServiceHost: ${MASTER_IPADDR}
k8sServicePort: 6443
kubeProxyReplacement: true
ipam:
  mode: kubernetes
securityContext:
  privileged: true
# Required
bpf:
  masquerade: true
EOF

helm install cilium cilium/cilium \
  --version 1.16.5 \
  --values cilium-values.yaml \
  --namespace kube-system

kubectl wait --timeout=90s --for=condition=Ready -n kube-system \
  pods -l k8s-app=cilium

# Recreate some containers
kubectl get pods \
  --all-namespaces \
  -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,HOSTNETWORK:.spec.hostNetwork \
  --no-headers=true \
  | grep '<none>' \
  | awk '{print "-n "$1" "$2}' \
  | xargs -L 1 -r kubectl delete pod

Deploy pod contains dns utilities.

cat << EOF > dnsutils.yaml
apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: registry.k8s.io/e2e-test-images/agnhost:2.39
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
EOF
kubectl apply -f dnsutils.yaml

nslookup will be timeout.

kubectl exec -i -t dnsutils -- nslookup kubernetes.default

nslookup using CoreDNS ClusterIP should succeeds.

kubectl exec -i -t dnsutils -- nslookup kubernetes.default $(kubectl get svc coredns -n kube-system -o jsonpath={.spec.clusterIP})

Cilium Version

❯ kubectl -n kube-system exec ds/cilium -- cilium version
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-cgroup (init), apply-sysctl-overwrites (init), clean-cilium-state (init), install-cni-binaries (init)
Client: 1.16.5 ad688277 2024-12-12T20:18:31+00:00 go version go1.22.10 linux/arm64
Daemon: 1.16.5 ad688277 2024-12-12T20:18:31+00:00 go version go1.22.10 linux/arm64

It also occurred on amd64.

Kernel Version

Linux k8s-worker 6.8.0-49-generic #49-Ubuntu SMP PREEMPT_DYNAMIC Sun Nov  3 21:21:58 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Kubernetes Version

Client Version: v1.32.0
Kustomize Version: v5.5.0
Server Version: v1.30.4

Regression

o: works well
x: not works

1.15.11: o
1.15.12: o
1.16.0: o
1.16.4: o
1.16.5: x

Sysdump

No response

Relevant log output

Anything else?

ipMasqAgent enabled with masqLinkLocal: false did not help.

Result of hubble observe is as follows.
When it works:

❯ kubectl exec -ti pod/$(./k8s-get-cilium-pod.sh dnsutils default) -n kube-system -c cilium-agent \
  -- hubble observe --since 1m --pod default/dnsutils
Dec 22 07:51:41.852: 127.0.0.1:39065 (world) <> default/dnsutils (ID:32060) pre-xlate-rev TRACED (UDP)
Dec 22 07:51:41.852: default/dnsutils:50642 (ID:32060) -> 169.254.25.10:53 (world) to-stack FORWARDED (UDP)
Dec 22 07:51:41.853: default/dnsutils:50642 (ID:32060) <- 169.254.25.10:53 (world) to-endpoint FORWARDED (UDP)
Dec 22 07:51:41.853: 169.254.25.10:53 (world) <> default/dnsutils (ID:32060) pre-xlate-rev TRACED (UDP)
Dec 22 07:51:41.853: default/dnsutils:60221 (ID:32060) -> 169.254.25.10:53 (world) to-stack FORWARDED (UDP)
Dec 22 07:51:41.853: default/dnsutils:60221 (ID:32060) <- 169.254.25.10:53 (world) to-endpoint FORWARDED (UDP)
Dec 22 07:51:41.853: 169.254.25.10:53 (world) <> default/dnsutils (ID:32060) pre-xlate-rev TRACED (UDP)
Dec 22 07:51:41.855: default/dnsutils:51283 (ID:32060) -> 169.254.25.10:53 (world) to-stack FORWARDED (UDP)
Dec 22 07:51:41.856: default/dnsutils:51283 (ID:32060) <- 169.254.25.10:53 (world) to-endpoint FORWARDED (UDP)
Dec 22 07:51:41.856: 169.254.25.10:53 (world) <> default/dnsutils (ID:32060) pre-xlate-rev TRACED (UDP)

When it does not work:

❯ kubectl exec -ti pod/$(./k8s-get-cilium-pod.sh dnsutils default) -n kube-system -c cilium-agent \
  -- hubble observe --since 1m --pod default/dnsutils
Dec 22 07:45:18.830: 127.0.0.1:45439 (world) <> default/dnsutils (ID:32060) pre-xlate-rev TRACED (UDP)
Dec 22 07:45:18.830: default/dnsutils:55914 (ID:32060) -> 169.254.25.10:53 (world) to-network FORWARDED (UDP)
Dec 22 07:45:28.849: default/dnsutils:55914 (ID:32060) -> 169.254.25.10:53 (world) to-network FORWARDED (UDP)

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    affects/v1.16This issue affects v1.16 brancharea/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.area/kprAnything related to our kube-proxy replacement.feature/bpf-masqueradingkind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.kind/regressionThis functionality worked fine before, but was broken in a newer release of Cilium.needs/triageThis issue requires triaging to establish severity and next steps.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions