Skip to content

Conversation

gyutaeb
Copy link
Contributor

@gyutaeb gyutaeb commented Feb 19, 2025

Problem Statement

Root Cause

  • This issue occurs when the original NAT entry is evicted from the SNAT_MAPPING map due to the LRU policy. As a result, even for a flow that has already undergone SNAT, a new source port is assigned. The remote endpoint then terminates the session due to the unexpected source port.

Solution Summary

  • Recreating the deleted original entry in snat_v4_rev_nat_handle_mapping().
    image

Impact Assessment

This patch has a substantial impact on connection reliability. In testing, it reduced the connection failure rate from 1–10% to virtually 0%, effectively eliminating the issue.

image

Issue Details

  • snat_v4_rev_nat_handle_mapping() correctly finds the reverse entry.
  • When the egress packet reaches snat_v4_nat_handle_mapping(), the original entry has already been evicted by LRU.
  • Since the original entry is missing, snat_v4_new_mapping() assigns a new source port for NAT.
  • If this happens, the endpoint socket detects a port change and sends an RST packet.

eBPF Unit Test Flow

  • The original packet gets ReSNATed when the entry is deleted.
  • The reply packet restores the Original NAT entry if it is deleted.
  • The original pakcket does not get ReSNATed. Because the Original NAT entry is restored.

Network failures can occur due to LRU eviction in the BPF map. This happens because NAT entries are evicted during packet transmission. Inspired by #35304, I developed this PR to help mitigate this issue. This PR restores deleted original NAT entries in snat_v4_rev_nat_handle_mapping(). The idea was initially proposed at #36631 (comment), and this PR serves as its actual implementation.
Thanks to @julianwiedmann @sugangli

@gyutaeb gyutaeb requested a review from a team as a code owner February 19, 2025 15:01
@gyutaeb gyutaeb requested a review from gentoo-root February 19, 2025 15:01
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 19, 2025
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Feb 19, 2025
@gyutaeb
Copy link
Contributor Author

gyutaeb commented Feb 19, 2025

I am contributing the IPv4 version first. After discussing the logic, I plan to extend it to IPv6.

@joestringer
Copy link
Member

Looks like there was an infrastructure issue while building the images from this PR. I'll close/reopen to retrigger those, hopefully that gets CI to cooperate.

@joestringer joestringer reopened this Feb 19, 2025
@joestringer joestringer added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Feb 19, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 19, 2025
@gyutaeb
Copy link
Contributor Author

gyutaeb commented Feb 20, 2025

There still seems to be an issue with the infrastructure. I'll update the branch.

@maintainer-s-little-helper
Copy link

Commit 3f4cf30 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 20, 2025
@gyutaeb gyutaeb force-pushed the pr/mitigate-LRU-eviction branch from 3f4cf30 to 8dbe12e Compare February 20, 2025 02:47
@maintainer-s-little-helper
Copy link

Commit 3f4cf30 does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@gyutaeb gyutaeb force-pushed the pr/mitigate-LRU-eviction branch from 8dbe12e to dd1dea6 Compare February 20, 2025 06:44
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Feb 20, 2025
Copy link
Contributor

@gentoo-root gentoo-root left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (modulo minor comments). Would be good if you could add a test.

@sugangli
Copy link
Contributor

The changes looks good to me. But I am wondering the scenario you mentioned in #36631 (comment)

  • snat_v4_rev_nat_handle_mapping() correctly finds the reverse entry.
  • When the egress packet reaches snat_v4_nat_handle_mapping(), the original entry has already been evicted by LRU.
  • Since the original entry is missing, snat_v4_new_mapping() assigns a new source port for NAT.
  • If this happens, the endpoint socket detects a port change and sends an RST packet.

The restoration will happen only if the following egress packet has not come, but the reply comes back and find the original entry is missing? If the following packet comes before the reply, we will still get an RST packet?

@gyutaeb
Copy link
Contributor Author

gyutaeb commented Feb 21, 2025

@sugangli Thanks for the review.

The restoration will happen only if the following egress packet has not come, but the reply comes back and find the original entry is missing? If the following packet comes before the reply, we will still get an RST packet?

Restoration occurs when a reply comes back and the original entry is missing, regardless of whether a following egress packet comes or not. There is no need to depend on whether it comes or not. If the following egress packet comes before the reply, an RST packet will be received, and the RST packet will be delivered to the client socket. The client socket will then attempt to reconnect using a new source port. Therefore, a situation where RST packets are continuously received and the connection becomes stuck will not occur.

In most IP communications without fragmentation, the reply packet typically arrives before the following egress packets. The key idea of this logic is to restore the original entry that was evicted by LRU in such cases. This approach helps reduce RST errors. which is particularly important for end users who require high-quality network communication.

@gyutaeb gyutaeb force-pushed the pr/mitigate-LRU-eviction branch from dd1dea6 to fcd0db0 Compare March 10, 2025 14:47
@gyutaeb gyutaeb requested a review from gentoo-root March 10, 2025 14:47
@gyutaeb gyutaeb force-pushed the pr/mitigate-LRU-eviction branch 2 times, most recently from 94ba3be to 6bb402f Compare March 10, 2025 14:54
@gyutaeb
Copy link
Contributor Author

gyutaeb commented Mar 11, 2025

Go Related Checks / Precheck (pull_request) is temporarily infrastructure issue.

docker: Error response from daemon: Get "https://quay.io/v2/cilium/cilium-builder/manifests/sha256:2cae94972899ac31ccd08e199b14e6a7825a2aef04ca238ac0a891455b2a54ba": received unexpected HTTP status: 502 Bad Gateway.

@gyutaeb gyutaeb force-pushed the pr/mitigate-LRU-eviction branch from 6bb402f to 932d970 Compare March 11, 2025 09:23
@gyutaeb
Copy link
Contributor Author

gyutaeb commented Mar 19, 2025

@gentoo-root Could you take a look at this?

Copy link
Contributor

@gentoo-root gentoo-root left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Just one non-blocking comment/question on the test.

@gyutaeb
Copy link
Contributor Author

gyutaeb commented Mar 21, 2025

@julianwiedmann @gentoo-root Thank you :)

I've investigated failure Conformance EKS (ci-eks). This failure was caused by an EC2 interface issue. Since the same test passed in the ca-central-1 region but failed in us-east-1, it appears to be a temporary issue specific to the us-east-1 region. Would you mind rerunning the failed test?

  ❌ Found 3 logs in cilium-cilium-13989679574-1.us-east-1.eksctl.io/kube-system/cilium-operator-79d4c4dcd7-hdl5p (cilium-operator) matching list of errors that must be investigated:
time=2025-03-21T10:48:02Z level=warn msg="Unable to assign additional IPs to interface, will create new interface" subsys=ipam-allocator-aws name=ip-192-168-85-209.ec2.internal instanceID=i-0b96ac87dd11638e4 error="operation error EC2: AssignPrivateIpAddresses, https response error StatusCode: 400, RequestID: c06973e3-6a46-438e-b3ae-0b80deba2843, api error PrivateIpAddressLimitExceeded: Number of private addresses will exceed limit." selectedInterface=eni-038636af0f1e4d7d8 ipsToAllocate=1 (1 occurrences)
time=2025-03-21T10:48:02Z level=warn msg="Instance is out of interfaces" subsys=ipam-allocator-aws name=ip-192-168-85-209.ec2.internal instanceID=i-0b96ac87dd11638e4 (1 occurrences)
time=2025-03-21T10:48:16Z level=warn msg="Unable to assign additional IPs to interface, will create new interface" subsys=ipam-allocator-aws name=ip-192-168-85-209.ec2.internal instanceID=i-0b96ac87dd11638e4 error="operation error EC2: AssignPrivateIpAddresses, https response error StatusCode: 400, RequestID: c47750e5-7897-429c-bd94-e88cfd26bfe7, api error PrivateIpAddressLimitExceeded: Number of private addresses will exceed limit." selectedInterface=eni-038636af0f1e4d7d8 ipsToAllocate=1 (1 occurrences)

@gyutaeb gyutaeb force-pushed the pr/mitigate-LRU-eviction branch from 728be8c to 88c590c Compare March 21, 2025 13:42
@gentoo-root
Copy link
Contributor

/test

@gyutaeb
Copy link
Contributor Author

gyutaeb commented Mar 21, 2025

@gentoo-root Thank you for the test again. Failure was caused by a change to the github action that was merged about 3 hours ago.
I've rebased the branch.

unknown flag: --include-conn-disrupt-test-egw

ORG NAT entry could be removed by LRU eviction. This commit restore the
ORG NAT entry from REV NAT entry and tuple in
snat_v4_rev_nat_handle_mapping(). Thus, it can mitigate network failures
caused by LRU eviction.

Relates: cilium#35304
Signed-off-by: Gyutae Bae <gyu.8ae@gmail.com>
@gyutaeb gyutaeb force-pushed the pr/mitigate-LRU-eviction branch from 88c590c to 76d0735 Compare March 21, 2025 16:14
@gentoo-root
Copy link
Contributor

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 21, 2025
@joestringer joestringer added this pull request to the merge queue Mar 21, 2025
Merged via the queue into cilium:main with commit a54b9fc Mar 21, 2025
67 checks passed
@gyutaeb
Copy link
Contributor Author

gyutaeb commented Mar 22, 2025

Thank you to all the reviewers for your time and support.

@julianwiedmann @sugangli Thank you so much for reviewing the idea and actively supporting it. I really appreciate it.
@gentoo-root @joestringer Many thanks for the in-depth code review and for taking the time to help with the CI tests, despite your busy schedule. Your support means a lot.

To be more helpful for bpf: nat, I've added a diagram illustrating the core idea and expanded the explanation in the PR description. I hope this also helps other contributors.

I’ll be spending the next few weeks working through my backlog, and will follow up with a subsequent PR after #37346.

@gyutaeb
Copy link
Contributor Author

gyutaeb commented Apr 4, 2025

@julianwiedmann @gentoo-root @sugangli
Hi, I've reviewed the code for conntrack as well, following the NAT review. It seems that conntrack may still have the same LRU eviction issue. I think it would be good to apply this algorithm to conntrack as well. I'd love to hear your thoughts on this.

gentoo-root added a commit that referenced this pull request Aug 6, 2025
snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
gentoo-root added a commit that referenced this pull request Aug 7, 2025
snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
gentoo-root added a commit that referenced this pull request Aug 7, 2025
snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
github-merge-queue bot pushed a commit that referenced this pull request Aug 14, 2025
snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
rabelmervin pushed a commit to rabelmervin/cilium that referenced this pull request Aug 18, 2025
snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: cilium#35304
Ref: cilium#37747
Fixes: cilium#38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Aug 25, 2025
[ upstream commit b1bdfdd ]

snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Aug 25, 2025
[ upstream commit b1bdfdd ]

snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Aug 25, 2025
[ upstream commit b1bdfdd ]

[ backporter's notes: minor conflicts due to different map name and some
missing tests in stable branch. ]

snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Aug 25, 2025
[ upstream commit b1bdfdd ]

[ backporter's notes: minor conflicts due to different map name and some
missing tests in stable branch. ]

snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Aug 25, 2025
[ upstream commit b1bdfdd ]

[ backporter's notes: minor conflicts due to different map name and some
missing tests in stable branch. ]

snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Aug 26, 2025
[ upstream commit b1bdfdd ]

snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Aug 26, 2025
[ upstream commit b1bdfdd ]

snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Aug 26, 2025
[ upstream commit b1bdfdd ]

[ backporter's notes: minor conflicts due to different map name and some
missing tests in stable branch. ]

snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Aug 26, 2025
[ upstream commit b1bdfdd ]

snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
pippolo84 pushed a commit that referenced this pull request Aug 26, 2025
[ upstream commit b1bdfdd ]

[ backporter's notes: minor conflicts due to different map name and some
missing tests in stable branch. ]

snat_v4_new_mapping creates a RevSNAT entry, then a SNAT entry for new
connections. However, we also have a fallback mechanism in
snat_v4_rev_nat_handle_mapping, which restores the SNAT entry if it was
evicted by LRU, but the original RevSNAT entry still exists.

There is a hypothesis that the above two functions may race with each
other, and the following is possible:

1. snat_v4_new_mapping creates a RevSNAT entry.
2. snat_v4_rev_nat_handle_mapping finds the RevSNAT entry and "restores"
   the SNAT entry (which has never existed yet).
3. snat_v4_new_mapping goes on to create the SNAT entry, corresponding
   to the RevSNAT entry, created previously.
4. snat_v4_new_mapping fails because the SNAT entry already exists,
   created by snat_v4_rev_nat_handle_mapping.
5. snat_v4_new_mapping then removes the RevSNAT entry, trying to revert
   everything.

The right outcome in this case would be to keep the SNAT entry, whoever
created it first, so it makes sense not to fail on -EEXIST.

It also makes the fallback mechanism more robust, because there is a
time gap between __snat_lookup and __snat_create.

Ref: #35304
Ref: #37747
Fixes: #38466
Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. feature/snat Relates to SNAT or Masquerading of traffic kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants