-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Closed
Labels
area/agentCilium agent related.Cilium agent related.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.
Description
Is there an existing issue for this?
- I have searched the existing issues
What happened?
We encountered an issue in which a CiliumEndpoint referenced a CiliumIdenity that was already removed by the cilium-operator GC. A corresponding pod couldn't communicate with other pods except those running on the same node because of it. The warning Unable to release newly allocated identity again
appeared in the cilium-agent log during the time the problem happened.
$ kubectl get ciliumidentities.cilium.io 19805
Error from server (NotFound): ciliumidentities.cilium.io "19805" not found
$ kubectl -n app-mysql get ciliumendpoints.cilium.io
NAME ENDPOINT ID IDENTITY ID INGRESS ENFORCEMENT EGRESS ENFORCEMENT VISIBILITY POLICY ENDPOINT STATE IPV4 IPV6
moco-0 2001 19805
cilium-agent log
2022-05-11 21:01:03 level=warning msg="Unable to release newly allocated identity again" containerID= datapathPolicyRevision=37 desiredPolicyRevision=37 endpointID=2371 error="identity sync was cancelled: context canceled" identity=19805 identityLabels="k8s:app.kubernetes.io/created-by=moco,k8s:app.kubernetes.io/instance=stage0-ocean-1,k8s:app.kubernetes.io/name=mysql,k8s:io.cilium.k8s.namespace.labels.accurate.cybozu.com/parent=team-dbre,k8s:io.cilium.k8s.namespace.labels.cybozu.com/alert-group=dbre,k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=app-cybozu-com-mysql,k8s:io.cilium.k8s.namespace.labels.pod-security.cybozu.com/policy=traceable,k8s:io.cilium.k8s.namespace.labels.team=dbre,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=moco-stage0-ocean-1,k8s:io.kubernetes.pod.namespace=app-cybozu-com-mysql,k8s:moco.cybozu.com/role=replica,k8s:statefulset.kubernetes.io/pod-name=moco-stage0-ocean-1-1" ipv4= ipv6= k8sPodName=/ subsys=endpoint
operator-log
2022-05-11 21:12:45 level=info msg="Garbage collected identity" identity=19805 subsys=cilium-operator-generic
hubble log
"destination": {
"identity": 19805,
"namespace": "app-mysql",
"pod_name": "moco-0"
},
"Type": "L3_L4",
"node_name": "10.69.3.9",
"event_type": {
"type": 5
},
"traffic_direction": "EGRESS",
"drop_reason_desc": "POLICY_DENIED",
"Summary": "TCP Flags: SYN"
How to reproduce
- Create a stateful pod using the following manifest.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: suzuki-test
namespace: dctest
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: suzuki-test
serviceName: suzuki-test
template:
metadata:
labels:
app.kubernetes.io/name: suzuki-test
spec:
containers:
- image: quay.io/cybozu/testhttpd:0
name: testhttpd
volumeMounts:
- name: www
mountPath: /usr/share/suzuki-test/html
restartPolicy: Always
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: www
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: topolvm-provisioner
volumeMode: Filesystem
- Taint the node where the pod is running, and add a label to the pod
kubectl taint nodes ${node} node.cybozu.io/node-not-ready=true:NoExecute
kubectl -n dctest label pods suzuki-test-0 suzuki-test.cybozu.com/role=primary
- If all goes as expected, the following logs will appear. The resolve-identity controller allocates a new global key. Then, however, it stops setting the identity and try to releases it, but it can't because the endpoint is terminating state and the controller process is canceled and removed.
level=warning msg="Unable to release newly allocated identity again" containerID= datapathPolicyRevision=41 desiredPolicyRevision=41 endpointID=74 error="initial global identity sync was cancelled: context canceled" identity=46330 identityLabels="k8s:app.kubernetes.io/name=suzuki-test,k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=dctest,k8s:io.cilium.k8s.namespace.labels.pod-security.cybozu.com/policy=privileged,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=dctest,k8s:statefulset.kubernetes.io/pod-name=suzuki-test-0,k8s:suzuki-test.cybozu.com/role=primary" ipv4= ipv6= k8sPodName=/ subsys=endpoint
- Wait for a while until the operator removes the allocated identity. (The pod is pending)
- Untaint the node and add the label. Then a newly created endpoint references the identity already removed.
kubectl taint nodes ${node} node.cybozu.io/node-not-ready=true:NoExecute-
kubectl -n dctest label pods suzuki-test-0 suzuki-test.cybozu.com/role=primary
Cilium Version
cilium v1.11.5
Kernel Version
Linux rack0-cs4 5.15.37-flatcar #1 SMP Wed May 4 13:53:25 -00 2022 x86_64 AMD EPYC 7413 24-Core Processor AuthenticAMD GNU/Linux
Kubernetes Version
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"archive", BuildDate:"2022-03-23T08:04:34Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
Sysdump
No response
My sysdump is too big to upload
Relevant log output
No response
Anything else?
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
prune998
Metadata
Metadata
Assignees
Labels
area/agentCilium agent related.Cilium agent related.kind/bugThis is a bug in the Cilium logic.This is a bug in the Cilium logic.