-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
- The cilium operator relies on the ec2 instance limit in the process of IP allocation. This information fetched from AWS API on the Cilium operator startup here: https://github.com/cilium/cilium/blob/main/pkg/ipam/allocator/aws/aws.go#L102
- On 4 Oct 2023 AWS launched new instance types c7a (blog post)
- The Cilium operator that was already running before that launch has no information about limits for c7a types
- We are using Karpenter for node provisioning. It periodically fetches information about new instance types and started provisioning c7a instances because it’s cheap (because a low amount of AWS customers use new instance types right after launch)
- Karpenter provisions new nodes and Cilium cannot allocate IP addresses on it
Some cilium operator logs in the process of the incident
--update-ec2-adapter-limit-via-api already set to true and the issue was fixed after the ilium-operator reboot
I guess that cilium-operator should periodically fetch this API for new instance types to prevent such issues
Cilium Version
Client: 1.14.2 a674894 2023-09-09T20:59:33+00:00 go version go1.20.8 linux/amd64
Daemon: 1.14.2 a674894 2023-09-09T20:59:33+00:00 go version go1.20.8 linux/amd64
Kernel Version
Linux 5.15.128 #1 SMP Thu Sep 14 21:42:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.4-eks-2d98532", GitCommit:"3d90c097c72493c2f1a9dd641e4a22d24d15be68", GitTreeState:"clean", BuildDate:"2023-07-28T16:51:44Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}
Sysdump
n/a
Relevant log output
2023-10-04 17:57:17.292
level=warning msg="Unable to find limits for instance type, consider setting --update-ec2-adapter-limit-via-api=true on the Operator" instance-type=c7a.large subsys=eni
2023-10-05 10:39:24.478
level=warning msg="Instance not found! Please delete corresponding ciliumnode if instance has already been deleted." instanceID=i-0b7fc3f275d818c7f name=ip-10-7-128-51.ec2.internal subsys=ipam
Anything else?
No response
Code of Conduct
- I agree to follow this project's Code of Conduct