-
Notifications
You must be signed in to change notification settings - Fork 3.4k
ip-masq-agent: Ensure ip rules match the BPF ip-masq-agent configuration in AWS ENI mode #40141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of minor nits and a question left inline, but overall LGTM from an IPAM point of view.
I second Marco's suggestions about the ip-masq-agent "cellification" and the improvement of NonMasqCIDRsFromConfig
cache usage.
…mode Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>
67a551f
to
842410a
Compare
Alright, my |
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sig-agent
looks good to me. Thanks for extracting the IP Masq Agent into its own Hive Cell! It looks way cleaner this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! 💯
Description
See detailed description of the issue in #35501
This PR ensures that when BPF Masquerading is enabled together with the BPF
ip-masq-agent
, the Cilium Agent managed datapathip rule
s are actually consistent with theip-masq-agent
configuration. The idea is that allnonMasqueradeCIDRs
fromip-masq-agent
configuration actually need to be propagated down to the datapath to ensure that these CIDRs are not masqueraded.As mentioned in the issue, the actual datapath
ip rule
setup is here:cilium/pkg/datapath/linux/routing/routing.go
Lines 88 to 103 in e3a93b1
The
To
CIDRs are coming from the response to the CNI IP Allocation request. The response is populated here:cilium/pkg/ipam/crd.go
Lines 686 to 691 in cad67ba
The newly added logic in my PR simply reads the
ip-masq-agent
config file and returns the non-masqueradable CIDRs in the CNI IP allocation response.It's somewhat of a hotpath so I made sure that it fails gracefully if something goes wrong.
It also makes the best effort to read the CIDRs from memory and not from disk. I also moved the-> This part of the description is obsolete since I ended up doing a refactoring of thefsnotify
Watcher instantiation to theStart()
function to avoid its overhead in the codepath. Finally, in order to avoid race conditions with theip-masq-agent
startup and the IP allocation requests (the IPAM Allocator is started before theip-masq-agent
), this code doesn't rely on theip-masq-agent
being in running state.ip-masq-agent
in #40347 to extract it into a separate Cell.One small limitation is that the
ip rule
s will not be updated dynamically if theip-masq-agent
configuration changes. The new CIDRs will apply to all newly started pods however existing pods will need to be restarted in order for the new CIDRs to be applied. IMO it's not a big deal since the situation is already much better than what we had previously.Testing
Before the PR
Taking the same AWS setup as described in my original issue, when we enable masquerading for a given pod, its
ip rule
s look something like this:The
to
CIDRs are only coming from the VPC CIDRs auto-detected by the Operator. If we enable theip-masq-agent
, the rules stay the same.After the PR
Once we deploy the PR and then enable
ip-masq-agent
with the following config:The CIDRs are now properly propagated down to the
ip rule
level:We can also confirm that cross-VPC traffic is no longer dropped.
Fixes: #35501