-
Notifications
You must be signed in to change notification settings - Fork 3.4k
bpf: always track egress gateway connections #21499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f45d28a
to
1dd9bca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level comments:
Currently the SNAT logic assumes that all connections that needs to be translated are already tracked. The only exception for this is connections originating from the local host, which are tracked by snat_v4_track_local().
The expectation is that from-container
would create the CT entry for any other connection, correct?
In practice egress gateway is another case that requires special attention: in fact, all connections tunneled from a different node to a gateway node are currently not tracked. This means that whenever the CT garbage collector kicks in, all NAT entries for egress gateway will be deleted, potentially leading to connections breakage.
Any chance of adding a test for this scenario? If we have a mechanism to trigger GC, I suppose we could even just check the NAT table before/after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one @jibi, thank you! Reasoning is sound, just some fine-tuning aspects I'd like to clarify.
That's my understanding as well
I thought about tests but I didn't come up with anything I was 100% happy with so I decided to get these changes reviewed first and defer tests:
|
1dd9bca
to
afbad5d
Compare
Ack. Alternative might be to adapt the "established connection" test from #19880, and kick GC inbetween. |
afbad5d
to
e29670d
Compare
e29670d
to
3cd59da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice explanation and the test improvement!
45e1723
to
ad22fd2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Nice commit history btw 🎉
In order to reclaim orphan entries in the NAT table the agent uses the connection tracking state: if a NAT entry is not backed by a connection, it should be considered orphaned and can be reclaimed. Currently the SNAT logic assumes that all connections that needs to be translated are already tracked. The only exception for this is connections originating from the local host, which are tracked by snat_v4_track_local(). In practice egress gateway is another case that requires special attention: in fact, all connections tunneled from a different node to a gateway node are currently not tracked. This means that whenever the CT garbage collector kicks in, all NAT entries for egress gateway will be deleted, potentially leading to connections breakage. This commit fix this by making sure we always track these connections. While at it, also rename snat_v4_track_local() to snat_v4_track_connection(). Towards: #21346 Signed-off-by: Gilberto Bertin <jibi@cilium.io>
For consistency with its v4 counterpart. Signed-off-by: Gilberto Bertin <jibi@cilium.io>
ad22fd2
to
57448a3
Compare
/test |
During the backport of #21499 for some reason the tabs on a couple of functions got replaced with 7 spaces. This commit fixes this (no functional changes). Signed-off-by: Gilberto Bertin <jibi@cilium.io>
During the backport of #21499 for some reason the tabs on a couple of functions got replaced with 7 spaces. This commit fixes this (no functional changes). Signed-off-by: Gilberto Bertin <jibi@cilium.io>
In order to reclaim orphan entries in the NAT table the agent uses the connection tracking state: if a NAT entry is not backed by a connection, it should be considered orphaned and can be reclaimed.
Currently the SNAT logic assumes that all connections that needs to be translated are already tracked. The only exception for this is connections originating from the local host, which are tracked by snat_v4_track_local().
In practice egress gateway is another case that requires special attention: in fact, all connections tunneled from a different node to a gateway node are currently not tracked. This means that whenever the CT garbage collector kicks in, all NAT entries for egress gateway will be deleted, potentially leading to connections breakage.
This commit fix this by making sure we always track these connections.
While at it, also rename snat_v4_track_local to snat_v4_track_connection.
Towards: #21346