-
Notifications
You must be signed in to change notification settings - Fork 3.4k
bpf: egw: fix over-eager matching of EGW policy against reply traffic #36929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
julianwiedmann
merged 2 commits into
cilium:main
from
julianwiedmann:1.18-egw-reply-fix
Jan 14, 2025
Merged
bpf: egw: fix over-eager matching of EGW policy against reply traffic #36929
julianwiedmann
merged 2 commits into
cilium:main
from
julianwiedmann:1.18-egw-reply-fix
Jan 14, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8bffa19
to
9a1a2e2
Compare
/test |
(I'll also want to cook a BPF-level test for this problem) |
9a1a2e2
to
d243f65
Compare
The code to handle EGW reply traffic lives in the nodeport ingress path, after a packet has been RevSNATed. In contrast to the outbound path, it doesn't consider whether the packet's source is an in-cluster entity. This becomes problematic when the EGW policy specifies a broad-range `destinationCIDR`, such as 0.0.0.0/0, and accidentally matches in-cluster service traffic. More precisely: * a client pod matches an EGW policy of the form "from $client_pod to 0.0.0.0/0, exit via gateway nodeB" * client_pod accesses a nodeport service on gateway nodeB, where the request is DNATed to a remote backend, and SNATed to nodeB's IP. * once the backend's reply reaches nodeB, it is revSNATed to client_pod. At this point the packet matches the EGW policy entry, and is thus redirected into the overlay network and forwarded to client_pod. * As the RevDNAT step has been skipped, client_pod rejects this packet from an unknown source. Fix this by applying the RevDNAT logic *prior* to looking for EGW policy matches. Double-checking whether the packet's origin is an in-cluster entity might still make sense, but no longer strictly necessary. From a tcpdump on a kind cluster (with EGW, KPR without SocketLB and native routing): ``` # client accesses nodeport 31936 on remote LB node IP 10.244.1.236.35556 > 172.18.0.2.31936: Flags [S] # request is DNATed + SNATed, and forwarded to remote backend IP 172.18.0.2.35556 > 10.244.3.121.80: Flags [S] # backend's reply reaches the LB node IP 10.244.3.121.80 > 172.18.0.2.35556: Flags [S.] # LB node applies revSNAT, adds VXLAN encapsulation and forwards to client IP 172.18.0.2.54624 > 172.18.0.4.8472: OTV, flags [I] (0x08), overlay 0, instance 6 IP 10.244.3.121.80 > 10.244.1.236.35556: Flags [S.] # client's RST to the backend IP 10.244.1.236.35556 > 10.244.3.121.80: Flags [R] ``` Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Reply traffic for an SNATed in-cluster nodeport connection should not be captured by a catch-all EGW policy. Otherwise the packet reaches the client without the source being RevDNATed back to the nodeport, and thus resulting in a RST by the client. This adds testing for the scenario that was fixed in the preceding commit. Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
d243f65
to
97d8d85
Compare
/test |
YutaroHayakawa
approved these changes
Jan 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pippolo84
approved these changes
Jan 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
45 tasks
19 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
affects/v1.14
This issue affects v1.14 branch
affects/v1.15
This issue affects v1.15 branch
area/datapath
Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
area/loadbalancing
Impacts load-balancing and Kubernetes service implementations
backport-done/1.16
The backport for Cilium 1.16.x for this PR is done.
backport-done/1.17
The backport for Cilium 1.17.x for this PR is done.
feature/egress-gateway
Impacts the egress IP gateway feature.
kind/bug
This is a bug in the Cilium logic.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-note/bug
This PR fixes an issue in a previous release of Cilium.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The code to handle EGW reply traffic lives in the nodeport ingress path, after a packet has been RevSNATed. In contrast to the outbound path, it doesn't consider whether the packet's source is an in-cluster entity.
This becomes problematic when the EGW policy specifies a broad-range
destinationCIDR
, such as 0.0.0.0/0, and accidentally matches in-cluster service traffic. More precisely:Fix this by applying the RevDNAT logic prior to looking for EGW policy matches. Double-checking whether the packet's origin is an in-cluster entity might still make sense, but no longer strictly necessary.
From a tcpdump on a kind cluster (with EGW, KPR without SocketLB and native routing):