Skip to content

Conversation

lmb
Copy link
Contributor

@lmb lmb commented Jun 13, 2024

The logic to allocate SNAT mapping contains a race condition. At a high level it does the following:

if (!revsnat_exists(port)) {
    if (!create_revsnat(port)
        return error;

    ...
}

Two concurrent executions of the datapath may succeed the revsnat_exists check, which then leads to one of them bailing out since create_revsnat fails.

Instead simply try to create the RevSNAT entry. If that fails we retry with another port.

Updates: #33074

Avoid race during RevSNAT mapping creation, resulting in packet drop with "No mapping for NAT masquerade".

The logic to allocate SNAT mapping contains a race condition.
At a high level it does the following:

    if (!revsnat_exists(port)) {
        if (!create_revsnat(port)
            return error;

        ...
    }

Two concurrent executions of the datapath may succeed the
revsnat_exists check, which then leads to one of them bailing
out since create_revsnat fails.

Instead simply try to create the RevSNAT entry. If that fails we
retry with another port.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@lmb lmb added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. feature/snat Relates to SNAT or Masquerading of traffic labels Jun 13, 2024
@lmb
Copy link
Contributor Author

lmb commented Jun 13, 2024

/test

@julianwiedmann julianwiedmann self-requested a review June 13, 2024 08:51
@lmb lmb marked this pull request as ready for review June 13, 2024 09:22
@lmb lmb requested a review from a team as a code owner June 13, 2024 09:22
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, nice catch!

I slightly word-smithed the release note - but imho this is even a release-note/bug, and we should queue up a backport?

@julianwiedmann julianwiedmann added the kind/bug This is a bug in the Cilium logic. label Jun 13, 2024
@lmb
Copy link
Contributor Author

lmb commented Jun 13, 2024

/test

@lmb
Copy link
Contributor Author

lmb commented Jun 13, 2024

but imho this is even a release-note/bug, and we should queue up a backport?

I had a look, this is present even in the first version 0e785f9 which appeared in 1.5.0. There were a couple of changes in between, so the backport would have to be manual.

I'll leave it up to you to decide - maybe it makes sense to see whether this helps the user with their symptoms?

@julianwiedmann
Copy link
Member

I'll leave it up to you to decide - maybe it makes sense to see whether this helps the user with their symptoms?

Ack, sounds good. Let's release-note it as bug, but hold off on backports until we see if it helps.

@julianwiedmann julianwiedmann added release-note/bug This PR fixes an issue in a previous release of Cilium. affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch needs-backport/1.15 and removed release-note/minor This PR changes functionality that users may find relevant to operating Cilium. needs-backport/1.15 labels Jun 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 13, 2024
@aanm aanm added this pull request to the merge queue Jun 13, 2024
Merged via the queue into cilium:main with commit 9f6d52e Jun 13, 2024
@lmb lmb deleted the pr/lmb/fix-snat-race branch June 13, 2024 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. feature/snat Relates to SNAT or Masquerading of traffic kind/bug This is a bug in the Cilium logic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants