-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Open
Labels
area/agentCilium agent related.Cilium agent related.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.need-more-infoMore information is required to further debug or fix the issue.More information is required to further debug or fix the issue.pinnedThese issues are not marked stale by our issue bot.These issues are not marked stale by our issue bot.sig/scalabilityImpacts how well Cilium handles a high rate of events or churn.Impacts how well Cilium handles a high rate of events or churn.
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
One of the Kubernetes SLO is the Pod Startup Latency SLO. The current limit is the 99th percentile <= 5s. With the use of cilium Kubernetes is very close to this limit, sometimes even exceeding it.
All tests were performed on GKE's patched Cilium version based on Cilium OSS master. Here's the current status:
- With 100 nodes it didn't go beyond 5 seconds, but comes pretty close from time to time to the limit.
- When running with 500 nodes the average number for P99 is still within 5 seconds, but sometimes goes over and reaches around 5.5 seconds.
- For 5k nodes the latency is at ~6 seconds on average, but goes even beyond it quite often.
The situation became better with the change #21505, but there're still problems with the latency.
Cilium Version
master
Kernel Version
5.10.109
Kubernetes Version
1.25.2-gke.800
Sysdump
No response
Relevant log output
No response
Anything else?
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area/agentCilium agent related.Cilium agent related.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.need-more-infoMore information is required to further debug or fix the issue.More information is required to further debug or fix the issue.pinnedThese issues are not marked stale by our issue bot.These issues are not marked stale by our issue bot.sig/scalabilityImpacts how well Cilium handles a high rate of events or churn.Impacts how well Cilium handles a high rate of events or churn.