-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Closed as not planned
Closed as not planned
Copy link
Labels
area/agentCilium agent related.Cilium agent related.area/cniImpacts the Container Networking Interface between Cilium and the orchestrator.Impacts the Container Networking Interface between Cilium and the orchestrator.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.This was reported by a user in the Cilium community, eg via Slack.staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
- Create EKS cluster on AWS following this guide via
eksctl
:apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: vgrygoruk-2935 region: eu-west-1 managedNodeGroups: - name: ng-1 desiredCapacity: 2 privateNetworking: true taints: - key: "node.cilium.io/agent-not-ready" value: "true" effect: "NoExecute"
- Install "AWS VPC CNI", "CoreDNS" and "kube-proxy" EKS add-ons.
- Install cilium in "CNI Chaining" mode onto a AWS EKS cluster following official documentation.
helm install cilium cilium/cilium --version 1.13.2 \ --namespace kube-system \ --set cni.chainingMode=aws-cni \ --set cni.exclusive=false \ --set enableIPv4Masquerade=false \ --set tunnel=disabled \ --set endpointRoutes.enabled=true
- Run
cilium connectivity test
- Create SecurityGroupPolicy to attach pod security groups to
echo
pods incilium-test
namespace (as they have probes defined):--- apiVersion: vpcresources.k8s.aws/v1beta1 kind: SecurityGroupPolicy metadata: name: test-app-psgp namespace: cilium-test spec: podSelector: matchLabels: kind: echo securityGroups: groupIds: - sg-08b53279c80ec19d9
- Delete
echo-other-node-*
pod incilium-test
namespace, so new pod uses it's own ENI with security group attached to it:Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 18m default-scheduler Successfully assigned cilium-test/echo-other-node-78f77b57f8-lg8xg to ip-192-168-120-241.eu-west-1.compute.internal Normal SecurityGroupRequested 18m vpc-resource-controller Pod will get the following Security Groups [sg-08b53279c80ec19d9] Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "7dedc1e9a5dd13278502f2592cdc8e8d82276dab1f8aad1bbcb6380 f72b8f415": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container Normal ResourceAllocated 18m vpc-resource-controller Allocated [{"eniId":"eni-0b923d932ea2fe0af","ifAddress":"06:a4:ac:cd:ce:1b","privateIp":"192.168.103.13","vlanId":3,"subnetCidr":"192.168.96.0/19"}] to the pod Normal Pulled 18m kubelet Container image "quay.io/cilium/json-mock:v1.3.3@sha256:f26044a2b8085fcaa8146b6b8bb73556134d7ec3d5782c6a04a058c945924ca0" already present on machine Normal Created 18m kubelet Created container echo-other-node Normal Started 18m kubelet Started container echo-other-node Normal Pulled 18m kubelet Container image "docker.io/coredns/coredns:1.10.0@sha256:017727efcfeb7d053af68e51436ce8e65edbc6ca573720afb4f79c8594036955" already present on machine Normal Created 18m kubelet Created container dns-test-server Normal Started 18m kubelet Started container dns-test-server Warning Unhealthy 18m (x9 over 18m) kubelet Readiness probe failed: Get "http://192.168.103.13:8080/": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Warning Unhealthy 3m38s (x448 over 18m) kubelet Readiness probe failed: Get "http://192.168.103.13:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Expected result: pod echo-other-node-*
is running successfully. kubelet probes are not failing.
Actual result: echo-other-node-*
is started, but kubelet probes are failing.
Notes:
- security group references in security group policy allows ALL traffic on all ports (both ingress and egress) from/to VPC CIDR, so the issue is not related to rules in AWS Security groups.
echo-other-node-*
pod is running successfully, if I uninstallcilium
from the cluster and re-created the pod.helm uninstall cilium --namespace kube-system # re-create the pod and inspect pod events: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 50s default-scheduler Successfully assigned cilium-test/echo-other-node-78f77b57f8-rz78p to ip-192-168-120-241.eu-west-1.compute.internal Normal SecurityGroupRequested 50s vpc-resource-controller Pod will get the following Security Groups [sg-08b53279c80ec19d9] Warning FailedCreatePodSandBox 50s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d54d00652c58ec0ff2b08b11e0e72227f6c7bade61f1b1ff8224915bce227943": plugi n type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container Normal ResourceAllocated 49s vpc-resource-controller Allocated [{"eniId":"eni-0ba69eee6c7826dfe","ifAddress":"06:82:8f:17:be:e7","privateIp":"192.168.112.70","vlanId":1,"subnetCidr":"192.168.96.0/19"}] to the pod Normal Pulled 49s kubelet Container image "quay.io/cilium/json-mock:v1.3.3@sha256:f26044a2b8085fcaa8146b6b8bb73556134d7ec3d5782c6a04a058c945924ca0" already present on machine Normal Created 49s kubelet Created container echo-other-node Normal Started 49s kubelet Started container echo-other-node Normal Pulled 49s kubelet Container image "docker.io/coredns/coredns:1.10.0@sha256:017727efcfeb7d053af68e51436ce8e65edbc6ca573720afb4f79c8594036955" already present on machine Normal Created 49s kubelet Created container dns-test-server Normal Started 49s kubelet Started container dns-test-server
Cilium Version
- v1.13.1
- v1.14.0-snapshot.0
Kernel Version
Linux ip-XXX-XXX-XXX-XXX.eu-west-1.compute.internal 5.10.173-154.642.amzn2.x86_64 #1 SMP Wed Mar 15 00:26:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
v1.25.8-eks-ec5523e
Sysdump
cilium-sysdump-20230418-105354.zip
Relevant log output
No response
Anything else?
I've managed to get the probes working, if I install cilium helm chart with --set endpointRoutes.enabled=false
value:
helm upgrade --install cilium cilium/cilium --version 1.13.1 \
--namespace kube-system \
--set cni.chainingMode=aws-cni \
--set cni.exclusive=false \
--set enableIPv4Masquerade=false \
--set tunnel=disabled \
--set endpointRoutes.enabled=false
However, I'm not sure if this is a limitation of deployment in CNI chaining mode (and documentation needs to be updated) or I've simply found a workaround for a bug (that should be fixed).
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
area/agentCilium agent related.Cilium agent related.area/cniImpacts the Container Networking Interface between Cilium and the orchestrator.Impacts the Container Networking Interface between Cilium and the orchestrator.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.This was reported by a user in the Cilium community, eg via Slack.staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.