-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
- Running an IPv6-only cluster with cilium in native routing mode, every pod gets a globally routable IP, masquerade is off.
- Enabled Ingress controller with an ingress pointing to an nginx pod. Upon testing, expected to see nginx test page, instead, envoy sends "upstream connect error or disconnect/reset before headers. reset reason: connection timeout"
I started debugging:
- Examined envoy logs. Confirmed that request from envoy to the upstream timed out
- Examined
hubble observe
. Saw packets in both directions between reserved:ingress IP and upstream IP
That's weird, where is the packet getting lost?
- Ran tcpdump on the upstream host. Saw packets in both directions
- Ran tcpdump on the envoy host. Only saw outgoing packets, no returning packets from upstream
- Ran
ip -6 neigh
and noticed that reserved:ingress IP was missing from list, but other endpoints were there - Ran
ndisc6
on reserved:ingress IP, times out with no response
I did some digging in the code to see where this was happening. Looks like bpf/lib/icmp6.h
is where cilium handles responding to NS requests with NA. Specifically at line 311 in that file the program tries to look up an endpoint. For the reserved:ingress endpoint, I suspected it wasn't being found. I confirmed this with some debug prints.
I then looked into the implementation for the reserved:ingress endpoint, mainly stumbling across #28126. It looks like the reserved:ingress endpoint only exists to enforce policies, but no BPF things are done for the endpoint? I don't have a great understanding of how exactly this works, but perhaps that's why the icmp6.h can't find the reserved:ingress ep?
Cilium should respond to these NS requests for reserved:ingress as well.
Cilium Version
1.15.5
This issue exists on 1.14 and 1.16 as well, and likely even earlier.
Kernel Version
6.6.31
Kubernetes Version
1.30.1
Regression
No response
Sysdump
No response
Relevant log output
No response
Anything else?
No response
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct