Skip to content

egress gw: Non-tunnel mode is broken #17386

@brb

Description

@brb

The egress GW datapath is broken when running in the direct routing mode.

To illustrate this, let's imagine a client pod sending a request to the internet via the egress GW:

client pod @ node A (runs cilium) -- vxlan tunnel --> egress GW @ node B (runs cilium) ----> the internet

The request gets SNAT-ed to the egress GW IP addr, and then sent to the internet. When the reply hits the egress GW node, it gets rev-SNAT-ed in bpf_host by:

"from-netdev" -> handle_netdev() -> do_netdev() -> .... handle_ipv4() -> nodeport_lb4() -> ... -> snat_v4_process()

And then is passed to the stack by handle_ipv4() which is called after the rev-SNAT-ed packet is recircled via tail calls.

In the upper-stack we have the iptables filter rules which default policy is DROP. As no Cilium's FORWARD rule matches the reply, the reply is DROP-ed by the default policy. In the case of tunneling, we have a rule (cilium: any->cluster on cilium_host forward accept) which matches the reply, and thus it is successfully sent to the client.

During the initial testing of the feature the direct routing mode was not broken, as all traffic was flowing through the tunnel (see #16328 for the context).

One way to fix the issue is to send the reply over the tunnel instead of passing to the stack (it also removes asymmetry of the return path). For this we would need to identify that a reply is part of the egress GW traffic. As @pchaigno suggested, this can be done by the following:

if rev-SNATed IP ∈ native CIDR && rev-SNATed IP !∈ node pod CIDR => send to tunnel

Another alternative is to add a flag to the struct nat_entry denoting that the traffic belong to the egress GW. This would allow us to avoid the IPCache lookup (required by rev-SNATed IP !∈ node pod CIDR). However, not sure whether it's easy to propagate that flag after the rev-SNAT to handle_ipv4().

Metadata

Metadata

Assignees

Labels

area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.feature/egress-gatewayImpacts the egress IP gateway feature.kind/bugThis is a bug in the Cilium logic.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions