Skip to content

NAT LRU Eviction due to Full Capacity #34833

@sugangli

Description

@sugangli

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.14.13

What happened?

This issue is intended to track a branch of the BPF SNAT improvements issue #31643, where we see LRU eviction due to full capacity. As we discussed in the community meeting, we might have a longer term goal and some short term enhancement we can look into.

Long term Goals:

  1. The fundamental problem is that we store the NAT entries in pairs, but the LRU eviction is not aware of this context. The connection will be interrupted if one of the entries is evicted. So if we can combine the out-going entry with its response entry into a single one, we can permanently solve this problem. AFAIU, this is still in idea phase, and might have further impact beyond solving the eviction problem itself.
  2. Kernel side eviction behavior improvement?

Short term Enhancements:

We may just restore the response entry whenever we do a out-going entry lookup. However, it comes with some cons:

  1. Since we are already at capacity, this is likely to have domino effects on other connections. Maybe we can run reproduction or load test to see if this patch mitigate the issue.
  2. This is also relying on the outbound packet to re-establish SNAT, which might not be available all the time.

How can we reproduce the issue?

Repro in #29305 applies here, but I will update with further repro steps once I starts to work on the PR mentioned above.

Cilium Version

v1.14+

Kernel Version

5.10+

Kubernetes Version

1.24+

Regression

No response

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions