Skip to content

Conversation

yushoyamaguchi
Copy link
Member

@yushoyamaguchi yushoyamaguchi commented Mar 1, 2025

This commit fixes an inconsistency in how Geneve-DSR packets are processed when host-routing is disabled.
In the current behavior, only DSR reply packets are seen by the host-stack, while incoming packets are redirected by BPF and skip the host-stack.
This causes conntrack entries to be incomplete.

As a result, when BPF-masquerade is disabled, reply packets are unintentionally masqueraded by iptables, which break DSR functionality.
See: #32189.
Note: When BPF-masquerade is disabled, host-routing is also automatically disabled.

* eBPF-based masquerading

To resolve this, cil_from_overlay() now delivers incoming Geneve-DSR packets to the host-stack, if host-routing is disabled.
This ensures conntrack entries are created properly and prevents incorrect masquerading.

@yushoyamaguchi yushoyamaguchi requested review from a team as code owners March 1, 2025 14:10
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 1, 2025
@yushoyamaguchi yushoyamaguchi requested a review from aspsk March 1, 2025 14:10
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Mar 1, 2025
@yushoyamaguchi yushoyamaguchi force-pushed the fix_dsr_no_bpf_masq branch 4 times, most recently from 57c1759 to 46a3f4a Compare March 1, 2025 14:33
Copy link
Member

@ysksuzuki ysksuzuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR.

The issue occurs because bpf_overlay delivers packets to the local endpoint without passing through the kernel stack, even in legacy routing mode. This results in asymmetric communication. I believe we can address this by passing DSR GENEVE packets to the kernel from bpf_overlay. However, I'm not certain if this approach covers all scenarios, so further discussion is needed. Additionally, we need to add tests for this.

@ysksuzuki ysksuzuki added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. feature/dsr Relates to Cilium's Direct-Server-Return feature for KPR. labels Mar 1, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Mar 1, 2025
@maintainer-s-little-helper
Copy link

Commit fd20c47 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 2, 2025
@maintainer-s-little-helper
Copy link

Commit fd20c47 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Mar 2, 2025
@ysksuzuki ysksuzuki marked this pull request as draft March 2, 2025 06:31
@ysksuzuki
Copy link
Member

Let me convert this PR to draft since the more discussion is needed

@yushoyamaguchi
Copy link
Member Author

yushoyamaguchi commented May 24, 2025

@ysksuzuki

As you said in the APAC dev mtg, I've add the logic of checking nat_table entry which is created in nodeport_dsr_ingress_ipvX().

I think remaining fix point for merging is only BPF-unit-test refactoring .
Please review.

cc @julianwiedmann

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me! Some nitpicks below, but happy to approve once the PR is no longer in draft state :).

Please take a moment to refresh the PR & patch description, to reflect the current state.

@yushoyamaguchi yushoyamaguchi changed the title bpf: cil_from_overlay() pass Geneve-DSR packet to host-stack to insert conntrack-entry for preventing the packets from being masqueraded by an iptables rule. bpf: Fix Geneve-DSR inconsistency with host-routing off May 27, 2025
@yushoyamaguchi yushoyamaguchi force-pushed the fix_dsr_no_bpf_masq branch 2 times, most recently from 1d5f77c to 22c5224 Compare May 27, 2025 18:47
@yushoyamaguchi
Copy link
Member Author

yushoyamaguchi commented May 28, 2025

@ysksuzuki

Could you reopen this PR?

@julianwiedmann julianwiedmann marked this pull request as ready for review May 28, 2025 11:19
@julianwiedmann julianwiedmann requested a review from a team as a code owner May 28, 2025 11:19
@julianwiedmann julianwiedmann requested a review from smagnani96 May 28, 2025 11:19
This commit fixes an inconsistency in how Geneve-DSR packets are processed when host-routing is disabled.
In the current behavior, only DSR reply packets are seen by the host-stack, while incoming packets are redirected by BPF and skip the host-stack.
This causes conntrack entries to be incomplete.

As a result, when BPF-masquerade is disabled, reply packets are unintentionally masqueraded by iptables, which break DSR functionality.
See: cilium#32189.
Note: When BPF-masquerade is disabled, host-routing is also automatically disabled.
https://github.com/cilium/cilium/blob/55aecc0f4706b5f8a70fc38ed2817cc2e15b0f60/Documentation/operations/performance/tuning.rst?plain=1#L168

To resolve this, cil_from_overlay() now delivers incoming Geneve-DSR packets to the host-stack, if host-routing is disabled.
This ensures conntrack entries are created properly and prevents incorrect masquerading.

Signed-off-by: yushoyamaguchi <ysh.824@outlook.jp>
@julianwiedmann
Copy link
Member

/test

@julianwiedmann julianwiedmann added the area/loadbalancing Impacts load-balancing and Kubernetes service implementations label May 29, 2025
@julianwiedmann julianwiedmann enabled auto-merge May 30, 2025 04:23
@julianwiedmann
Copy link
Member

@ysksuzuki just needs your approval now :)

Copy link
Member

@ysksuzuki ysksuzuki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

@julianwiedmann julianwiedmann added this pull request to the merge queue May 30, 2025
Merged via the queue into cilium:main with commit d37c062 May 30, 2025
69 of 70 checks passed
@julianwiedmann julianwiedmann linked an issue May 30, 2025 that may be closed by this pull request
@yushoyamaguchi yushoyamaguchi deleted the fix_dsr_no_bpf_masq branch September 1, 2025 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/loadbalancing Impacts load-balancing and Kubernetes service implementations feature/dsr Relates to Cilium's Direct-Server-Return feature for KPR. kind/community-contribution This was a contribution made by a community member. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Geneve DSR is broken when running w/o BPF masquerade
3 participants