-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Labels
area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.Impacts bpf/ or low-level forwarding details, including map management and monitor messages.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.This was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.This issue requires triaging to establish severity and next steps.
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
equal or higher than v1.14.13
What happened?
This issue is intended to track a branch of the BPF SNAT improvements issue #31643, where we see LRU eviction due to full capacity. As we discussed in the community meeting, we might have a longer term goal and some short term enhancement we can look into.
Long term Goals:
- The fundamental problem is that we store the NAT entries in pairs, but the LRU eviction is not aware of this context. The connection will be interrupted if one of the entries is evicted. So if we can combine the out-going entry with its response entry into a single one, we can permanently solve this problem. AFAIU, this is still in idea phase, and might have further impact beyond solving the eviction problem itself.
- Kernel side eviction behavior improvement?
Short term Enhancements:
We may just restore the response entry whenever we do a out-going entry lookup. However, it comes with some cons:
- Since we are already at capacity, this is likely to have domino effects on other connections. Maybe we can run reproduction or load test to see if this patch mitigate the issue.
- This is also relying on the outbound packet to re-establish SNAT, which might not be available all the time.
How can we reproduce the issue?
Repro in #29305 applies here, but I will update with further repro steps once I starts to work on the PR mentioned above.
Cilium Version
v1.14+
Kernel Version
5.10+
Kubernetes Version
1.24+
Regression
No response
Sysdump
No response
Relevant log output
No response
Anything else?
No response
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.Impacts bpf/ or low-level forwarding details, including map management and monitor messages.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.This was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.This issue requires triaging to establish severity and next steps.