Skip to content

Conversation

jibi
Copy link
Member

@jibi jibi commented Sep 29, 2022

In order to reclaim orphan entries in the NAT table the agent uses the connection tracking state: if a NAT entry is not backed by a connection, it should be considered orphaned and can be reclaimed.

Currently the SNAT logic assumes that all connections that needs to be translated are already tracked. The only exception for this is connections originating from the local host, which are tracked by snat_v4_track_local().

In practice egress gateway is another case that requires special attention: in fact, all connections tunneled from a different node to a gateway node are currently not tracked. This means that whenever the CT garbage collector kicks in, all NAT entries for egress gateway will be deleted, potentially leading to connections breakage.

This commit fix this by making sure we always track these connections.

While at it, also rename snat_v4_track_local to snat_v4_track_connection.

Towards: #21346

@jibi jibi added kind/bug This is a bug in the Cilium logic. area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/bug This PR fixes an issue in a previous release of Cilium. feature/egress-gateway Impacts the egress IP gateway feature. labels Sep 29, 2022
@jibi jibi requested review from brb and julianwiedmann September 29, 2022 10:41
@jibi jibi force-pushed the pr/jibi/track-egressgw-connections branch 2 times, most recently from f45d28a to 1dd9bca Compare September 29, 2022 10:50
@jibi jibi marked this pull request as ready for review September 29, 2022 15:34
@jibi jibi requested a review from a team as a code owner September 29, 2022 15:34
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level comments:

Currently the SNAT logic assumes that all connections that needs to be translated are already tracked. The only exception for this is connections originating from the local host, which are tracked by snat_v4_track_local().

The expectation is that from-container would create the CT entry for any other connection, correct?

In practice egress gateway is another case that requires special attention: in fact, all connections tunneled from a different node to a gateway node are currently not tracked. This means that whenever the CT garbage collector kicks in, all NAT entries for egress gateway will be deleted, potentially leading to connections breakage.

Any chance of adding a test for this scenario? If we have a mechanism to trigger GC, I suppose we could even just check the NAT table before/after.

Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @jibi, thank you! Reasoning is sound, just some fine-tuning aspects I'd like to clarify.

@jibi
Copy link
Member Author

jibi commented Sep 30, 2022

The expectation is that from-container would create the CT entry for any other connection, correct?

That's my understanding as well

Any chance of adding a test for this scenario? If we have a mechanism to trigger GC, I suppose we could even just check the NAT table before/after.

I thought about tests but I didn't come up with anything I was 100% happy with so I decided to get these changes reviewed first and defer tests:

  • I looked into bpf/tests/nat_test.c to add a new test case, but there's some work to do on mocking the NAT and endpoint maps, which I wasn't sure was worth to test just snat_v4_track_connection()
  • for e2e tests the best approach we have would be (I think) diffing sudo cilium bpf nat list | grep $IP:$PORT | wc -l before and after each connection, which again doesn't satisfies me 100%, so I wanted to spend some more time thinking about that

@jibi jibi force-pushed the pr/jibi/track-egressgw-connections branch from 1dd9bca to afbad5d Compare September 30, 2022 08:26
@julianwiedmann
Copy link
Member

  • for e2e tests the best approach we have would be (I think) diffing sudo cilium bpf nat list | grep $IP:$PORT | wc -l before and after each connection, which again doesn't satisfies me 100%, so I wanted to spend some more time thinking about that

Ack. Alternative might be to adapt the "established connection" test from #19880, and kick GC inbetween.

@jibi jibi force-pushed the pr/jibi/track-egressgw-connections branch from afbad5d to e29670d Compare October 2, 2022 18:21
@jibi jibi requested a review from a team as a code owner October 2, 2022 18:21
@jibi jibi requested a review from christarazi October 2, 2022 18:21
@ldelossa ldelossa self-requested a review October 3, 2022 14:02
@jibi jibi force-pushed the pr/jibi/track-egressgw-connections branch from e29670d to 3cd59da Compare October 3, 2022 19:20
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice explanation and the test improvement!

@jibi jibi force-pushed the pr/jibi/track-egressgw-connections branch 2 times, most recently from 45e1723 to ad22fd2 Compare October 4, 2022 08:48
@jibi jibi marked this pull request as draft October 4, 2022 08:58
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Nice commit history btw 🎉

jibi added 2 commits October 4, 2022 11:36
In order to reclaim orphan entries in the NAT table the agent uses the
connection tracking state: if a NAT entry is not backed by a connection,
it should be considered orphaned and can be reclaimed.

Currently the SNAT logic assumes that all connections that needs to be
translated are already tracked. The only exception for this is
connections originating from the local host, which are tracked by
snat_v4_track_local().

In practice egress gateway is another case that requires special
attention: in fact, all connections tunneled from a different node to a
gateway node are currently not tracked. This means that whenever the CT
garbage collector kicks in, all NAT entries for egress gateway will be
deleted, potentially leading to connections breakage.

This commit fix this by making sure we always track these connections.

While at it, also rename snat_v4_track_local() to
snat_v4_track_connection().

Towards: #21346
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
For consistency with its v4 counterpart.

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
@jibi jibi force-pushed the pr/jibi/track-egressgw-connections branch from ad22fd2 to 57448a3 Compare October 4, 2022 09:38
@jibi
Copy link
Member Author

jibi commented Oct 4, 2022

/test

@jibi jibi marked this pull request as ready for review October 4, 2022 10:34
@jibi jibi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 5, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 6, 2022
@ldelossa ldelossa merged commit f9697d2 into master Oct 6, 2022
@ldelossa ldelossa deleted the pr/jibi/track-egressgw-connections branch October 6, 2022 00:16
@jibi jibi mentioned this pull request Oct 10, 2022
1 task
@jibi jibi added backport-pending/1.12 backport/author The backport will be carried out by the author of the PR. and removed needs-backport/1.12 labels Oct 10, 2022
@qmonnet qmonnet added backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. and removed backport-pending/1.12 labels Oct 18, 2022
jibi added a commit that referenced this pull request Oct 19, 2022
During the backport of #21499 for some reason the tabs on a couple of
functions got replaced with 7 spaces. This commit fixes this (no
functional changes).

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
@jibi jibi mentioned this pull request Oct 19, 2022
aanm pushed a commit that referenced this pull request Oct 21, 2022
During the backport of #21499 for some reason the tabs on a couple of
functions got replaced with 7 spaces. This commit fixes this (no
functional changes).

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. backport/author The backport will be carried out by the author of the PR. backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. feature/egress-gateway Impacts the egress IP gateway feature. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants