-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Closed
Labels
area/agentCilium agent related.Cilium agent related.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.This was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.This issue requires triaging to establish severity and next steps.
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
equal or higher than v1.17.3 and lower than v1.18.0
What happened?
Upgrading the cilium helm-chart from 1.17.2 to 1.17.3 with the exact same values results in the cilium-operator crashing with the following error:
time="2025-04-23T09:49:25.524358585Z" level=info msg="Starting ENI allocator..." subsys=ipam-allocator-aws
time="2025-04-23T09:49:25.796209131Z" level=warning msg="Unable to synchronize EC2 interface list" error="operation error EC2: DescribeNetworkInterfaces, https response error StatusCode: 400, RequestID: 6a455e57-a9f6-4524-aadc-ce3ede4f490a, api error InvalidParameterCombination: The parameter NetworkInterfaceIds cannot be used with the parameter MaxResults" subsys=eni
How can we reproduce the issue?
Install cilium with the following values.yaml:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/compute-type
operator: NotIn
values:
- fargate
agentNotReadyTaintKey: startup-taint.cluster-autoscaler.kubernetes.io/cilium-not-ready
bandwidthManager:
bbr: true
enabled: true
bpf:
masquerade: false
tproxy: true
bpfClockProbe: true
certgen:
image:
repository: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/aio-gfpw/quay.io/cilium/certgen
cluster:
name: aio-gfpw
clustermesh:
apiserver:
image:
repository: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/aio-gfpw/quay.io/cilium/clustermesh-apiserver
cni:
chainingMode: none
dnsProxy:
endpointMaxIpPerHostname: 4000
minTtl: 0
enableIPv4BIGTCP: true
enableIPv4Masquerade: false
enableIPv6BIGTCP: true
enableIPv6Masquerade: false
encryption:
enabled: false
nodeEncryption: true
type: wireguard
endpointHealthChecking:
enabled: false
eni:
awsEnablePrefixDelegation: true
ec2APIEndpoint: ec2.eu-west-1.amazonaws.com
enabled: true
eniTags: {}
iamRole: arn:aws:iam::123456789012:role/aio-gfpw-cilium-operator
instanceTagsFilter:
- aws:eks:cluster-name=aio-gfpw
updateEC2AdapterLimitViaAPI: true
envoy:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/compute-type
operator: NotIn
values:
- fargate
enabled: true
image:
repository: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/aio-gfpw/quay.io/cilium/cilium-envoy
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-node-critical
prometheus:
enabled: true
serviceMonitor:
enabled: false
labels:
system: "true"
resources:
limits: null
requests:
cpu: 50m
memory: 100Mi
rollOutPods: true
tolerations:
- operator: Exists
updateStrategy:
type: OnDelete
healthChecking: false
hubble:
enabled: true
eventBufferCapacity: "8191"
metrics:
enableOpenMetrics: true
enabled:
- dns:query;labelsContext=source_namespace,source_workload
- httpV2:exemplars=true;labelsContext=source_namespace,source_workload,source_app,destination_namespace,destination_workload,destination_app,traffic_direction
serviceMonitor:
enabled: false
labels:
system: "true"
relay:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- hubble-relay
topologyKey: topology.kubernetes.io/zone
enabled: true
image:
repository: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/aio-gfpw/quay.io/cilium/hubble-relay
podDisruptionBudget:
enabled: true
maxUnavailable: 1
replicas: 3
resources:
requests:
cpu: 25m
rollOutPods: true
updateStrategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
ui:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- hubble-ui
topologyKey: topology.kubernetes.io/zone
backend:
image:
repository: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/aio-gfpw/quay.io/cilium/hubble-ui-backend
enabled: true
frontend:
image:
repository: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/aio-gfpw/quay.io/cilium/hubble-ui
ingress:
className: ingress-nginx
enabled: true
hosts:
- hubble.aio-gfpw.aws.example.com
podDisruptionBudget:
enabled: true
maxUnavailable: 1
replicas: 1
resources:
requests:
cpu: 25m
rollOutPods: true
updateStrategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
image:
repository: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/aio-gfpw/quay.io/cilium/cilium
ipam:
mode: eni
k8sServiceHost: 51B4364E34F9C9DD6668F765127898E1.gr7.eu-west-1.eks.amazonaws.com
k8sServicePort: 443
kubeProxyReplacement: true
l7Proxy: true
labels: k8s:!job-name k8s:!controller-uid
loadBalancer:
l7:
backend: envoy
serviceTopology: true
localRedirectPolicy: true
nodeinit:
image:
repository: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/aio-gfpw/quay.io/cilium/startup-script
operator:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: name
operator: In
values:
- cilium-operator
topologyKey: topology.kubernetes.io/zone
extraArgs:
- --unmanaged-pod-watcher-interval=0
image:
repository: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/aio-gfpw/quay.io/cilium/operator
podDisruptionBudget:
enabled: true
maxUnavailable: 1
priorityClassName: system-cluster-critical
prometheus:
enabled: true
serviceMonitor:
enabled: false
labels:
system: "true"
replicas: 2
resources:
requests:
cpu: 25m
rollOutPods: true
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
- key: startup-taint.cluster-autoscaler.kubernetes.io/cilium-not-ready
operator: Exists
- key: startup-taint.cluster-autoscaler.kubernetes.io/dns-not-ready
operator: Exists
- key: efs.csi.aws.com/agent-not-ready
operator: Exists
pmtuDiscovery:
enabled: true
policyEnforcementMode: default
preflight:
image:
repository: 123456789012.dkr.ecr.eu-west-1.amazonaws.com/aio-gfpw/quay.io/cilium/cilium
priorityClassName: system-node-critical
prometheus:
enabled: true
serviceMonitor:
enabled: true
labels:
system: "true"
metricRelabelings:
- action: keep
regex: cilium_operator_ces_sync_errors_total|cilium_controllers_failing|cilium_errors_warnings_total|cilium_ipcache_errors_total|cilium_policy_import_errors_total|cilium_policy_l7_parse_errors_total|cilium_bpf_map_pressure
sourceLabels:
- __name__
resources:
limits: null
requests:
cpu: 50m
memory: 300Mi
routingMode: native
socketLB:
enabled: true
terminatePodConnections: true
svcSourceRangeCheck: false
updateStrategy:
type: OnDelete
Cilium Version
1.17.3
Kernel Version
Bottlerocket OS 1.36.0 (aws-k8s-1.31) 6.1.131
Kubernetes Version
Client Version: v1.32.4
Server Version: v1.31.7-eks-bcf3d70
Regression
1.17.2
Sysdump
No response
Relevant log output
Anything else?
Rolling back to 1.17.2 immediately fixes the problem.
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area/agentCilium agent related.Cilium agent related.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.This was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.This issue requires triaging to establish severity and next steps.