-
Notifications
You must be signed in to change notification settings - Fork 3.4k
endpoint: Place IngressIPs to endpoints map #35143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
FWIW, failing "Build commits" workflow should be fixed on rebase now that #35141 was merged.
could you explain a bit more what the Ingress IPs are? From my understanding these are somehow used by Envoy, and allocated from the PodCIDR? And there is a special identity ( And from a BPF-routing perspective - I'd expect that these IPs reside in Host network namespace, and not in a normal pod (== veth pair) ? |
Simplest way of describing Ingress IPs is: IPs (IPv4 and IPv6) allocated from the PodCIDR range of the node that are used as the source addresses of traffic forwarded by Cilium Ingress (and GW API) in the node. There is no real/full Cilium endpoint with these addresses, so there is no BPF datapath (bpf_lxc) nor bpf policy maps for them. Cilium agent computes a policy for the special entity though, and sends it to Envoy so that Envoy can enforce policy on the traffic forwarded by Cilium Ingress (or GW API). ARP table on the destination node would normally be updated due to the traffic being sent from Cilium Ingress, but if it is flushed for any reason, or for IPv6 ND, the node with these IPs may need to be able to respond to ARP/ND to recover the flushed/missing ARP/ND table entry on the destination node. No process in the node is listening on these addresses, so ARP/ND is only relevant for getting reply packets back to the node. |
/test |
So if a local pod would receive a connection by such an IngressIP, and attempts to send a reply - where should that reply get forwarded to? With the IngressIP now visible in the local endpoint map to |
Are you saying that since the IngressIPs are now in the endpoints map, this lookup now succeeds and finds a non-null
The desired action would be to route to host. Envoy uses an |
e1f3b0d
to
db5ce1b
Compare
/test |
db5ce1b
to
55171aa
Compare
Then I think we'll need a bit more work, so the programs that handle a delivery to local endpoint (eg from-container, from-overlay) understand that they should let the packet pass through into hostns, and not attempt a delivery via policy tailcall. Right now that works by setting the |
/test |
Right, the current |
You mean this spot, correct? Makes sense. Note how it currently special-cases the
My suggestion would be
|
55171aa
to
ef9d451
Compare
ef9d451
to
eaef6e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Jarno, looks good! Some minor remarks below.
I'd suggest to split the change into two patches:
- introduce
ENDPOINT_F_ATHOSTNS
,ENDPOINT_MASK_HOST_DELIVERY
etc and motivate why it's needed. - switch the Ingress endpoint to set
ENDPOINT_F_ATHOSTNS
.
Besides that, could you please update the PR / patch description to reflect the latest changes? Right now it's still very much focused on the IPv6 ND problem ...
Add a new endpoint flag ENDPOINT_F_ATHOSTNS that informs the datapath to deliver endpoint's traffic to the host stack, but for which ARP and IPv6 ND are still served by the bpf datapath due to the endpoint's IP being from the node's IP allocation CIDR range. This can be used for enabling IPv6 ND for Ingress IPs in future. Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
Even endpoints without bpf programs or policy need ARP and IPv6 ND to work so that traffic can reach them. Place such endpoints in the bpf endpoints map (aka lxcmap). This fixes missing IPv6 ND responses for the Ingress IPs. Fixes: cilium#32980 Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
1b18e82
to
fca8c95
Compare
/test |
Cilium agent manages Ingress endpoint for the purpose of policy generation for Cilium Ingress. The Ingress endpoint has IPs from the node's CIDR range like any other endpoint, but it does not have any representation at the bpf datapath, as it is only used for policy enforcement and for IP addressing at the Envoy proxy. Due to the lacking bpf datapath representation IPv6 ND does not currently work for the Ingress IPs, which causes IPv6 communication with backends not working.
Fix this by introducing minimal bpf datapath representation for the Ingress endpoint via Ingress endpoint entries in the lxcmap, which is used by the IPV6 ND implementation in the Cilium bpf datapath to enable ND advertisements to be sent when ND request for the Ingress IPv6 has been received.
A new endpoint flag ENDPOINT_F_ATHOSTNS is added to inform the datapath to deliver endpoint's traffic to the host stack, but for which ARP and IPv6 ND are still served by the bpf datapath due to the endpoint's IP being from the node's IP allocation CIDR range.
Fixes: #32980