Skip to content

[v1.16] ipam: Fix IPv4 native routing CIDR auto detection #35611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pippolo84
Copy link
Member

@pippolo84 pippolo84 commented Oct 29, 2024

BPF Masquerading relies on IPV4_SNAT_EXCLUSION_DST_CIDR datapath config value to know which destination CIDRs should not be SNATed for pod to pod traffic.

In case of IPAM modes "eni", "azure" or "alibabacloud", if "ipv4-native-routing-cidr" is not explicitly set, the exclusion destination CIDR is autodetected at runtime, specifically from the relevant cloud provider Status section in the CiliumNode CRD.

In commit #50e42f1659 the way that autodetection logic propagates that value was changed and it now relies on the LocalNodeStore to emit an update for the new IPv4 native routing CIDR value. But the datapath still reads the value stored in the global config option variable, thus it ignores the update from the autodetection logic.

To let the datapath configure the correct SNAT exclusion CIDR, restore the previous logic to update the global config option variable alongside the propagation of the updated value with LocalNodeStore.

Related: 50e42f1 ("node: Add ip{v4,v6} native routing CIDR")
Fixes: #34615

Fixes BPF Masquerading exclusion CIDR for IPAM modes "eni", "azure" and "alibabacloud".

BPF Masquerading relies on IPV4_SNAT_EXCLUSION_DST_CIDR datapath config
value to know which destination CIDRs should not be SNATed for pod to
pod traffic.

In case of IPAM modes "eni", "azure" or "alibabacloud", if
"ipv4-native-routing-cidr" is not explicitly set, the exclusion
destination CIDR is autodetected at runtime, specifically from the
relevant cloud provider Status section in the CiliumNode CRD.

In commit #50e42f1659 the way that autodetection logic propagates that
value was changed and it now relies on the LocalNodeStore to emit an
update for the new IPv4 native routing CIDR value. But the datapath
still reads the value stored in the global config option variable, thus
it ignores the update from the autodetection logic.

To let the datapath configure the correct SNAT exclusion CIDR, restore
the previous logic to update the global config option variable alongside
the propagation of the updated value with LocalNodeStore.

Related: 50e42f1 ("node: Add ip{v4,v6} native routing CIDR")
Fixes: cilium#34615

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
@pippolo84 pippolo84 added kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/ipam labels Oct 29, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.16 This PR represents a backport for Cilium 1.16.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Oct 29, 2024
@pippolo84
Copy link
Member Author

/test-backport-1.16

@mathpl mathpl added the release-blocker/1.16 This issue will prevent the release of the next version of Cilium. label Oct 29, 2024
@pippolo84 pippolo84 marked this pull request as ready for review October 29, 2024 18:10
@pippolo84 pippolo84 requested a review from a team as a code owner October 29, 2024 18:10
@pippolo84 pippolo84 requested a review from gandro October 29, 2024 18:10
@joestringer
Copy link
Member

Is there a main version of this or is it not necessary due to other changes in the mean time?

@pippolo84
Copy link
Member Author

Is there a main version of this or is it not necessary due to other changes in the mean time?

Here it is: #35624
Due to the changes to the orchestrator, the approach for main is different, that's why I opened a PR specific for v1.16 here.

@gandro
Copy link
Member

gandro commented Oct 30, 2024

To let the datapath configure the correct SNAT exclusion CIDR, restore the previous logic to update the global config option variable alongside the propagation of the updated value with LocalNodeStore.

Have you investigated how much work it would be to change the datapath instead?

I do feel a bit uneasy on exposing the detected native routing CIDR that way. But I think the PR as is seems safe, i.e. we mutate the DaemonConfig here from newDaemon as the call site, which I believe is allowed. Later mutations however would be unsafe.

I guess the main version of the fix will take a different approach, so hopefully this here will only be relevant for the lifetime of v1.16.

@pippolo84
Copy link
Member Author

Have you investigated how much work it would be to change the datapath instead?

My understanding is that we would need the "full-reconciling" orchestrator to react to a runtime update of the ipv4 native routing CIDR and reinitialize the datapath. Alternatively, it is surely possible to add a custom piece of logic to do that, but I don't think it is the best choice to do that just for v1.16.

I guess the main version of the fix will take a different approach, so hopefully this here will only be relevant for the lifetime of v1.16.

Yep, this approach is safe for v1.16 because it is basically restoring what was there before #31372. And luckily, the three steps:

  1. IPAM autodetect the CIDR
  2. datapath initializes with the ipv4 native routing CIDR value
  3. daemon config promise is resolved

happen in that order, so the change is safe.

The approach for main takes advantage of the new orchestrator implementation to propagate the new autodetected native routing CIDR, without the risk of modifying the daemon config after the promise has been resolved.

@pippolo84 pippolo84 added affects/v1.16 This issue affects v1.16 branch and removed affects/v1.16 This issue affects v1.16 branch labels Oct 30, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 30, 2024
@julianwiedmann julianwiedmann added this pull request to the merge queue Oct 30, 2024
Merged via the queue into cilium:v1.16 with commit feaafb1 Oct 30, 2024
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.16 This PR represents a backport for Cilium 1.16.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.16 This issue will prevent the release of the next version of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants