-
Notifications
You must be signed in to change notification settings - Fork 3.4k
bpf:nat: Restore ORG NAT entry if it's not found #37747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I am contributing the IPv4 version first. After discussing the logic, I plan to extend it to IPv6. |
Looks like there was an infrastructure issue while building the images from this PR. I'll close/reopen to retrigger those, hopefully that gets CI to cooperate. |
There still seems to be an issue with the infrastructure. I'll update the branch. |
Commit 3f4cf30 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
3f4cf30
to
8dbe12e
Compare
Commit 3f4cf30 does not match "(?m)^Signed-off-by:". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
8dbe12e
to
dd1dea6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (modulo minor comments). Would be good if you could add a test.
The changes looks good to me. But I am wondering the scenario you mentioned in #36631 (comment)
The restoration will happen only if the following egress packet has not come, but the reply comes back and find the original entry is missing? If the following packet comes before the reply, we will still get an RST packet? |
@sugangli Thanks for the review.
Restoration occurs when a reply comes back and the original entry is missing, regardless of whether a following egress packet comes or not. There is no need to depend on whether it comes or not. If the following egress packet comes before the reply, an RST packet will be received, and the RST packet will be delivered to the client socket. The client socket will then attempt to reconnect using a new source port. Therefore, a situation where RST packets are continuously received and the connection becomes stuck will not occur. In most IP communications without fragmentation, the reply packet typically arrives before the following egress packets. The key idea of this logic is to restore the original entry that was evicted by LRU in such cases. This approach helps reduce RST errors. which is particularly important for end users who require high-quality network communication. |
dd1dea6
to
fcd0db0
Compare
94ba3be
to
6bb402f
Compare
Go Related Checks / Precheck (pull_request) is temporarily infrastructure issue.
|
6bb402f
to
932d970
Compare
@gentoo-root Could you take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. Just one non-blocking comment/question on the test.
@julianwiedmann @gentoo-root Thank you :) I've investigated failure Conformance EKS (ci-eks). This failure was caused by an EC2 interface issue. Since the same test passed in the
|
728be8c
to
88c590c
Compare
/test |
@gentoo-root Thank you for the test again. Failure was caused by a change to the github action that was merged about 3 hours ago.
|
ORG NAT entry could be removed by LRU eviction. This commit restore the ORG NAT entry from REV NAT entry and tuple in snat_v4_rev_nat_handle_mapping(). Thus, it can mitigate network failures caused by LRU eviction. Relates: cilium#35304 Signed-off-by: Gyutae Bae <gyu.8ae@gmail.com>
88c590c
to
76d0735
Compare
/test |
Thank you to all the reviewers for your time and support. @julianwiedmann @sugangli Thank you so much for reviewing the idea and actively supporting it. I really appreciate it. To be more helpful for I’ll be spending the next few weeks working through my backlog, and will follow up with a subsequent PR after #37346. |
@julianwiedmann @gentoo-root @sugangli |
snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: cilium#35304 Ref: cilium#37747 Fixes: cilium#38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
[ upstream commit b1bdfdd ] snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b1bdfdd ] snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b1bdfdd ] [ backporter's notes: minor conflicts due to different map name and some missing tests in stable branch. ] snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b1bdfdd ] [ backporter's notes: minor conflicts due to different map name and some missing tests in stable branch. ] snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b1bdfdd ] [ backporter's notes: minor conflicts due to different map name and some missing tests in stable branch. ] snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b1bdfdd ] snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b1bdfdd ] snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b1bdfdd ] [ backporter's notes: minor conflicts due to different map name and some missing tests in stable branch. ] snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b1bdfdd ] snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit b1bdfdd ] [ backporter's notes: minor conflicts due to different map name and some missing tests in stable branch. ] snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new connections. However, we also have a fallback mechanism in snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was evicted by LRU, but the original RevSNAT entry still exists. There is a hypothesis that the above two functions may race with each other, and the following is possible: 1. snat_v4_new_mapping creates a RevSNAT entry. 2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores" the SNAT entry (which has never existed yet). 3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding to the RevSNAT entry, created previously. 4. snat_v4_new_mapping fails because the SNAT entry already exists, created by snat_v4_rev_nat_handle_mapping. 5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert everything. The right outcome in this case would be to keep the SNAT entry, whoever created it first, so it makes sense not to fail on -EEXIST. It also makes the fallback mechanism more robust, because there is a time gap between __snat_lookup and __snat_create. Ref: #35304 Ref: #37747 Fixes: #38466 Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com> Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Problem Statement
--bpf-nat-global-max
, it causes a DROP in SNAT behavior forcil-to-netdev
when using iptables masquerade #36572Root Cause
Solution Summary
Impact Assessment
This patch has a substantial impact on connection reliability. In testing, it reduced the connection failure rate from 1–10% to virtually 0%, effectively eliminating the issue.
Issue Details
eBPF Unit Test Flow
Network failures can occur due to LRU eviction in the BPF map. This happens because NAT entries are evicted during packet transmission. Inspired by #35304, I developed this PR to help mitigate this issue. This PR restores deleted original NAT entries in
snat_v4_rev_nat_handle_mapping()
. The idea was initially proposed at #36631 (comment), and this PR serves as its actual implementation.Thanks to @julianwiedmann @sugangli