Skip to content

Cilium operator cannot allocate IP addresses when AWS launch new instance types #28424

@sergeyshevch

Description

@sergeyshevch

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

  • The cilium operator relies on the ec2 instance limit in the process of IP allocation. This information fetched from AWS API on the Cilium operator startup here: https://github.com/cilium/cilium/blob/main/pkg/ipam/allocator/aws/aws.go#L102
  • On 4 Oct 2023 AWS launched new instance types c7a (blog post)
  • The Cilium operator that was already running before that launch has no information about limits for c7a types
  • We are using Karpenter for node provisioning. It periodically fetches information about new instance types and started provisioning c7a instances because it’s cheap (because a low amount of AWS customers use new instance types right after launch)
  • Karpenter provisions new nodes and Cilium cannot allocate IP addresses on it

Some cilium operator logs in the process of the incident

--update-ec2-adapter-limit-via-api already set to true and the issue was fixed after the ilium-operator reboot

I guess that cilium-operator should periodically fetch this API for new instance types to prevent such issues

Cilium Version

Client: 1.14.2 a674894 2023-09-09T20:59:33+00:00 go version go1.20.8 linux/amd64
Daemon: 1.14.2 a674894 2023-09-09T20:59:33+00:00 go version go1.20.8 linux/amd64

Kernel Version

Linux 5.15.128 #1 SMP Thu Sep 14 21:42:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Server Version: version.Info{Major:"1", Minor:"27+", GitVersion:"v1.27.4-eks-2d98532", GitCommit:"3d90c097c72493c2f1a9dd641e4a22d24d15be68", GitTreeState:"clean", BuildDate:"2023-07-28T16:51:44Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"}

Sysdump

n/a

Relevant log output

2023-10-04 17:57:17.292 
level=warning msg="Unable to find limits for instance type, consider setting --update-ec2-adapter-limit-via-api=true on the Operator" instance-type=c7a.large subsys=eni

2023-10-05 10:39:24.478 
level=warning msg="Instance not found! Please delete corresponding ciliumnode if instance has already been deleted." instanceID=i-0b7fc3f275d818c7f name=ip-10-7-128-51.ec2.internal subsys=ipam

Anything else?

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentCilium agent related.area/eniImpacts ENI based IPAM.area/ipamIP address management, including cloud IPAMhelp-wantedPlease volunteer for this by adding yourself as an assignee!kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions