Skip to content

aws: change IPv4Masquerade to false in ENI mode #38663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 24, 2025

Conversation

liyihuang
Copy link
Contributor

@liyihuang liyihuang commented Apr 1, 2025

In the ENI mode, all the pods IP address are routable in the VPC.
By default we still have IPv4Masquerade enabled, but we have the following iptable rules
to skip the SNAT for the VPC traffic.

-A CILIUM_POST_nat ! -d 192.168.0.0/16 -o eth+ -m comment --comment "cilium masquerade non-cluster" -j MASQUERADE

This commit will just disable the IPv4Masquerade in the ENI mode by default, we expect that
AWS NAT gateway will perform SNAT when the traffic accessing the Internet.

  • Change the helm values of IPv4Masquerade from true to false in ENI mode
  • Update the upgradeCompatibility so users could have smooth upgrade
  • Remove the IPv4Masquerade in the doc and remove egressMasqueradeInterfaces config.
  • Remove the IPv4Masquerade in the cilium-cli in ENI mode and remove
    egressMasqueradeInterfaces config.
  • Update the upgrade doc

Fixes: #38661

New clusters created in ENI mode will no longer masquerade pod traffic to the external world.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 1, 2025
@github-actions github-actions bot added the cilium-cli This PR contains changes related with cilium-cli label Apr 1, 2025
@liyihuang liyihuang added the release-note/misc This PR makes changes that have no direct user impact. label Apr 1, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 1, 2025
@liyihuang liyihuang added area/eni Impacts ENI based IPAM. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Apr 1, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 1, 2025
@liyihuang liyihuang added area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Apr 1, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 1, 2025
@liyihuang
Copy link
Contributor Author

/test

@liyihuang liyihuang force-pushed the pr/liyihuang/change_eni_masquerade branch from a3a514f to 17ef921 Compare April 2, 2025 03:16
@liyihuang
Copy link
Contributor Author

/test

@liyihuang liyihuang force-pushed the pr/liyihuang/change_eni_masquerade branch 2 times, most recently from 8faeb33 to 045622a Compare April 2, 2025 17:03
@liyihuang
Copy link
Contributor Author

/test

@liyihuang liyihuang marked this pull request as ready for review April 2, 2025 19:46
@liyihuang liyihuang requested review from a team as code owners April 2, 2025 19:46
@liyihuang liyihuang requested review from qmonnet, derailed and squeed April 2, 2025 19:46
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary?

@liyihuang
Copy link
Contributor Author

Why is this change necessary?

If we don't need to enable masquerading, then I assume we don't need to discuss which interfaces you want to apply masquerading to.

@qmonnet
Copy link
Member

qmonnet commented Apr 4, 2025

If we don't need to enable masquerading [...]

Sorry I was not clear, this is precisely my question: Why do we disable masquerading? You submit a PR to turn it off, but there's no explanation why. What's the objective, here?

@qmonnet
Copy link
Member

qmonnet commented Apr 4, 2025

Ah sorry I hadn't paid attention to the linked issue, now I see.

Could you please summarise the reason for the change in the commit description? So that we understand the motivation when looking at the Git history in the future.

@liyihuang
Copy link
Contributor Author

Ah sorry I hadn't paid attention to the linked issue, now I see.

Could you please summarise the reason for the change in the commit description? So that we understand the motivation when looking at the Git history in the future.

Sure. I thought I should put more why in the issue to discuss and put what change this is in the PR commits. I will change that and I'm waiting for more feedback to push my local commits

@qmonnet
Copy link
Member

qmonnet commented Apr 4, 2025

more why in the issue to discuss

It makes sense to have the full context in the issue. My recommendation would be to still provide at least a brief in summary in the PR and commit description, because that's where we'll be looking at when trying to understand the reason for the change.

and put what change this is in the PR commits

Ideally, it should be easy to understand what changes by looking at the code. The part that is not present in the diff is why it changes - that's why it's also important to motivate the change in the description 🙂

I will change that and I'm waiting for more feedback to push my local commits

Sure, thanks!

Copy link
Contributor

@squeed squeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor detail around defaulting, but looks good otherwise!

@aditighag aditighag added the upgrade-impact This PR has potential upgrade or downgrade impact. label Apr 14, 2025
@liyihuang liyihuang requested review from squeed and qmonnet April 15, 2025 14:53
Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you! Some formatting suggestions.

@liyihuang
Copy link
Contributor Author

Looks good, thank you! Some formatting suggestions.

Thanks and will push those in my next commit

Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liyihuang Nice work!

@liyihuang liyihuang force-pushed the pr/liyihuang/change_eni_masquerade branch 4 times, most recently from beeef19 to fca7198 Compare April 17, 2025 18:44
@liyihuang
Copy link
Contributor Author

/test

@qmonnet
Copy link
Member

qmonnet commented Apr 22, 2025

Please note that you have a merge conflict to address, on the upgrade notes.

@liyihuang liyihuang force-pushed the pr/liyihuang/change_eni_masquerade branch from fca7198 to aa6d0a7 Compare April 23, 2025 13:29
In the ENI mode, all the pods IP address are routable in the VPC.
By default we still have IPv4Masquerade enabled, but we have the following iptable rules
to skip the SNAT for the VPC traffic.

-A CILIUM_POST_nat ! -d 192.168.0.0/16 -o eth+ -m comment --comment "cilium masquerade non-cluster" -j MASQUERADE

This commit will just disable the IPv4Masquerade in the ENI mode by default, we expect that
AWS NAT gateway will perform SNAT when the traffic accessing the Internet.

- Change the helm values of IPv4Masquerade from true to false in ENI mode
- Update the upgradeCompatibility so users could have smooth upgrade
- Remove the IPv4Masquerade in the doc and remove egressMasqueradeInterfaces config.
- Remove the IPv4Masquerade in the cilium-cli in ENI mode and remove
  egressMasqueradeInterfaces config.
- Update the IPv4Masquerade value type to be bool or null and use null in the
  helm value as the default
- Update the upgrade doc

Signed-off-by: Liyi Huang <liyi.huang@isovalent.com>
@liyihuang liyihuang force-pushed the pr/liyihuang/change_eni_masquerade branch from aa6d0a7 to e769ea7 Compare April 23, 2025 13:43
@liyihuang
Copy link
Contributor Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 23, 2025
@squeed squeed added this pull request to the merge queue Apr 24, 2025
Merged via the queue into cilium:main with commit 10f87e6 Apr 24, 2025
66 checks passed
@kyleli666
Copy link

kyleli666 commented Aug 5, 2025

I updated the release note.
Should we suggest that existing clusters be migrated? (Is this safe to migrate on a running cluster)?
What about changing the helm charts themselves?

I was bit hesitant to put in the helm values or exsiting clusters for now.

If we have some people deployed their cluster nodes in the AWS public subnets(where it's not a good practice), their worker nodes could have the public EIP, where this could be a breaking change to them since pods IP will be SNAT to their worker nodes public IP when they access the Internet.

I think maybe it's better to keep this way for a while to see if we will receive the complain. Then we could move to something like

if eni=true and enableIPv4Masquerade is empty, then enableIPv4Masquerade = false.

I'm still new to make changes to helm values so I'm more conservative to introduce the changes to exsiting users

@squeed WDYT

We met some problem on public nodes.
We have to use some public ip nodes, and I enabled bpf.masquerade=true before 1.18.

When I upgrade to cilium 1.18, I see errors in cilium console that enableIPv4Masquerade was not set, and I see this issue, so I just removed bpf.masquerade=true, then pods on public nodes cannot access Internet anymore while pods on private nodes work well.
I had to add bpf.masquerade back and enableIPv4Masquerade finally.

May I ask what could be the problem?

@liyihuang
Copy link
Contributor Author

I dont think there is any problem and it's the default behaviour change just for enableIPv4Masquerade.

Before the change I checked Cilium and AWS doc, I can see they suggest or assume that the worker nodes are in the private subnets.

Before this change, I can see the users having the confusion since they have enabled the Masqueraded but found some traffic got SNAT(VPC to external) but some not(traffic within the VPC). I think it's more confusing and that's why I proposed this change.

@liyihuang liyihuang deleted the pr/liyihuang/change_eni_masquerade branch August 13, 2025 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. area/eni Impacts ENI based IPAM. cilium-cli This PR contains changes related with cilium-cli ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. upgrade-impact This PR has potential upgrade or downgrade impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changing ENI mode IPv4Masquerade from true to false in doc and cilium-cli
7 participants