Skip to content

Alive IP can be released if CNI DELETE is called with stale result #10065

@tgraf

Description

@tgraf

It has been observed that kubelet calls CNI DELETE multiple times with potentially stale CNI result information. This can lead to a race condition where the initial CNI DELETE properly releases the IP in use which then gets reused by a different pod. Any subsequent CNI DELETE with the stale IP will then cause the IP of the live pod to be released. While the pod will continue to function, the next scheduled pod will attempt to use that IP and continuously fail to be scheduled due to a IP in use error.

This is a regression of commit ab61853 which introduced the ability for CNI DELETE to release an IP even if the endpoint deletion fails which is required to fix the race condition when the CNI binary gets killed in between allocating an IP and creating the endpoint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugThis is a bug in the Cilium logic.kind/regressionThis functionality worked fine before, but was broken in a newer release of Cilium.priority/highThis is considered vital to an upcoming release.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions