-
Notifications
You must be signed in to change notification settings - Fork 3.4k
bpf: Enforce symmetric routing for endpoints with parent interfaces #35298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpf: Enforce symmetric routing for endpoints with parent interfaces #35298
Conversation
4eeed88
to
8757292
Compare
/test |
1 similar comment
/test |
1f45d3d
to
940ddb2
Compare
4852be3
to
7f17b08
Compare
/test |
7f17b08
to
9fa8e21
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super clean implementation! Looks good to me from an IPAM/ENI perspective.
I wonder if we could actually use the parent if index to get rid of the ip rules
in certain KPR modes too.
The ENI "ip rule" lifecycle management have been a repeated source of bugs. I think having only a field in the endpoint map rather than a set or rules and routes would simplify state management quite a bit. But that's of course work for later, not something we should do in this PR.
As discussed in [0], this feature should currently only be used when BPF masquerading is enabled. [0] cilium#35298 (comment) Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Explain why we're doing a plain redirect, without rewriting the DMAC. See cilium#35298 (comment). Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
As discussed in [0], this feature should currently only be used when BPF masquerading is enabled. [0] #35298 (comment) Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Explain why we're doing a plain redirect, without rewriting the DMAC. See #35298 (comment). Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
As discussed in [0], this feature should currently only be used when BPF masquerading is enabled. [0] cilium#35298 (comment) Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Explain why we're doing a plain redirect, without rewriting the DMAC. See cilium#35298 (comment). Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
[ upstream commit 94432bf ] As discussed in [0], this feature should currently only be used when BPF masquerading is enabled. [0] #35298 (comment) Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit f1ab7eb ] Explain why we're doing a plain redirect, without rewriting the DMAC. See #35298 (comment). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 94432bf ] As discussed in [0], this feature should currently only be used when BPF masquerading is enabled. [0] #35298 (comment) Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit f1ab7eb ] Explain why we're doing a plain redirect, without rewriting the DMAC. See #35298 (comment). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 94432bf ] As discussed in [0], this feature should currently only be used when BPF masquerading is enabled. [0] #35298 (comment) Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit f1ab7eb ] Explain why we're doing a plain redirect, without rewriting the DMAC. See #35298 (comment). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 94432bf ] As discussed in [0], this feature should currently only be used when BPF masquerading is enabled. [0] #35298 (comment) Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit f1ab7eb ] Explain why we're doing a plain redirect, without rewriting the DMAC. See #35298 (comment). Signed-off-by: Julian Wiedmann <jwi@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
In cilium#35298 endpoints got the parentIfIndex field to store the parent interface index, which is used to ensure endpoint traffic is routed out of the parent interface when set. This interface index is given to the agent by the CNI plugin and stored in the endpoint object. However, when the agent is restarted, it has to restore endpoint state from disk. The parentIfIndex field was not being stored or restored, so after a restart the parentIfIndex would be set to 0, which would cause the agent to not route traffic out of the parent interface. This commit adds the parentIfIndex field to the endpoint state serialization and deserialization process, so that it is stored and restored correctly. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
In cilium#35298 endpoints got the parentIfIndex field to store the parent interface index, which is used to ensure endpoint traffic is routed out of the parent interface when set. This interface index is given to the agent by the CNI plugin and stored in the endpoint object. However, when the agent is restarted, it has to restore endpoint state from disk. The parentIfIndex field was not being stored or restored, so after a restart the parentIfIndex would be set to 0, which would cause the agent to not route traffic out of the parent interface. This commit adds the parentIfIndex field to the endpoint state serialization and deserialization process, so that it is stored and restored correctly. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
In cilium#35298 endpoints got the parentIfIndex field to store the parent interface index, which is used to ensure endpoint traffic is routed out of the parent interface when set. This interface index is given to the agent by the CNI plugin and stored in the endpoint object. However, when the agent is restarted, it has to restore endpoint state from disk. The parentIfIndex field was not being stored or restored, so after a restart the parentIfIndex would be set to 0, which would cause the agent to not route traffic out of the parent interface. This commit adds the parentIfIndex field to the endpoint state serialization and deserialization process, so that it is stored and restored correctly. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
In cilium#35298 endpoints got the parentIfIndex field to store the parent interface index, which is used to ensure endpoint traffic is routed out of the parent interface when set. This interface index is given to the agent by the CNI plugin and stored in the endpoint object. However, when the agent is restarted, it has to restore endpoint state from disk. The parentIfIndex field was not being stored or restored, so after a restart the parentIfIndex would be set to 0, which would cause the agent to not route traffic out of the parent interface. This commit adds the parentIfIndex field to the endpoint state serialization and deserialization process, so that it is stored and restored correctly. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
In #35298 endpoints got the parentIfIndex field to store the parent interface index, which is used to ensure endpoint traffic is routed out of the parent interface when set. This interface index is given to the agent by the CNI plugin and stored in the endpoint object. However, when the agent is restarted, it has to restore endpoint state from disk. The parentIfIndex field was not being stored or restored, so after a restart the parentIfIndex would be set to 0, which would cause the agent to not route traffic out of the parent interface. This commit adds the parentIfIndex field to the endpoint state serialization and deserialization process, so that it is stored and restored correctly. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
[ upstream commit 0b6adf6 ] In #35298 endpoints got the parentIfIndex field to store the parent interface index, which is used to ensure endpoint traffic is routed out of the parent interface when set. This interface index is given to the agent by the CNI plugin and stored in the endpoint object. However, when the agent is restarted, it has to restore endpoint state from disk. The parentIfIndex field was not being stored or restored, so after a restart the parentIfIndex would be set to 0, which would cause the agent to not route traffic out of the parent interface. This commit adds the parentIfIndex field to the endpoint state serialization and deserialization process, so that it is stored and restored correctly. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
[ upstream commit 0b6adf6 ] In #35298 endpoints got the parentIfIndex field to store the parent interface index, which is used to ensure endpoint traffic is routed out of the parent interface when set. This interface index is given to the agent by the CNI plugin and stored in the endpoint object. However, when the agent is restarted, it has to restore endpoint state from disk. The parentIfIndex field was not being stored or restored, so after a restart the parentIfIndex would be set to 0, which would cause the agent to not route traffic out of the parent interface. This commit adds the parentIfIndex field to the endpoint state serialization and deserialization process, so that it is stored and restored correctly. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com> Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com>
In cilium#35298 endpoints got the parentIfIndex field to store the parent interface index, which is used to ensure endpoint traffic is routed out of the parent interface when set. This interface index is given to the agent by the CNI plugin and stored in the endpoint object. However, when the agent is restarted, it has to restore endpoint state from disk. The parentIfIndex field was not being stored or restored, so after a restart the parentIfIndex would be set to 0, which would cause the agent to not route traffic out of the parent interface. This commit adds the parentIfIndex field to the endpoint state serialization and deserialization process, so that it is stored and restored correctly. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
In certain setups, such as AWS ENI, it is expected that non-masqueraded traffic from an endpoint always egresses via a designated interface. We call this interface its parent interface.
Normally, this is taken care of by routes and assumptions. We have routes that direct traffic to the parent interface, for pod-to-pod and CIDRs in the non-masquerade list.
However, there are cases such as when the AWS NLB is configured to run in "target type: ip" (balance traffic directly to pod IPs) and with the
preserve_client_ip
option enabled (do not SNAT the source IP) that the route based approach breaks down. In this case, Cilium sees world traffic directly to pod IPs, which normally can't happen, exceptfor the NLB transparent traffic modifications. For ingress traffic this is not an issue, but return traffic does not hit any of the routes that would direct it to the parent interface. Nor can we make a route since every possible public IP can be a source.
As a fix, this PR adds logic to the snat/fwd path to check if we see reply traffic from an endpoint with a parent interface egressing out of any interface other than the parent interface. If we do we redirect the traffic to the parent interface. The location of this logic is chosen so that its compatible with proxied traffic as well.
This is a pwru log from a manual test in EKS: