Skip to content

Ciliumidentity: newly allocated ciliumidentity may become dirty data and the amount of ciliumidentity increase forever #35946

@orange30

Description

@orange30

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.16.0 and lower than v1.17.0

What happened?

We have one thousand and eight hundred pods in a cluster.
But the amount of ciliumidentity increases to be more then seventeen thousand, and the apiserver is on the verge of collapse .
With @FirmlyReality , we finaly find that in special scenarios the newly allocated ciliumidentity may become dirty data and cannot be deleted forever.

How can we reproduce the issue?

  1. When pod changes label and immediately is deleted, the newly allocated identity may become a dirty data in Allocator's localKeys for the race between "resolve-identity controller -> identityLabelsChanged" and "func (e *Endpoint) Stop()" .
  2. But in func (a *Allocator) syncLocalKeys, which is called periodically every five minites, for every data in Allocator's localKeys, especially for the dirty data, it will delete the ciliumidentity's annotation "io.cilium.heartbeat" which is attached by the cilium-operator ciliumidentity gc.
  3. Then the amount of ciliumidentity in apiserver and etcd will increase forever, it is a big threat to apiserver and etcd.

How can we reproduce the issue?
Maybe we can refer to this issue: #19877

Cilium Version

For this issue, we use v1.10.19, but all the versions have this problem.

Kernel Version

Any version

Kubernetes Version

v1.30

Regression

No response

Sysdump

No response

Relevant log output

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentCilium agent related.info-completedThe GH issue has received a reply from the authorkind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions