Skip to content

ipam: Protect release from releasing alive IP #10066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 6, 2020

Conversation

tgraf
Copy link
Member

@tgraf tgraf commented Feb 5, 2020

It has been observed that kubelet calls CNI DELETE multiple times with
potentially stale CNI result information. This can lead to a race condition
where the initial CNI DELETE properly releases the IP in use which then gets
reused by a different pod. Any subsequent CNI DELETE with the stale IP will
then cause the IP of the live pod to be released. While the pod will continue
to function, the next scheduled pod will attempt to use that IP and
continuously fail to be scheduled due to a IP in use error.

This is a regression of commit ab61853 which introduced the ability for CNI
DELETE to release an IP even if the endpoint deletion fails which is required
to fix the race condition when the CNI binary gets killed in between allocating
an IP and creating the endpoint.

Fixes: ab61853 ("cni: Release IP even when endpoint deletion fails")
Fixes: #10065


This change is Reviewable

It has been observed that kubelet calls CNI DELETE multiple times with
potentially stale CNI result information. This can lead to a race condition
where the initial CNI DELETE properly releases the IP in use which then gets
reused by a different pod. Any subsequent CNI DELETE with the stale IP will
then cause the IP of the live pod to be released. While the pod will continue
to function, the next scheduled pod will attempt to use that IP and
continuously fail to be scheduled due to a IP in use error.

This is a regression of commit ab61853 which introduced the ability for CNI
DELETE to release an IP even if the endpoint deletion fails which is required
to fix the race condition when the CNI binary gets killed in between allocating
an IP and creating the endpoint.

Fixes: ab61853 ("cni: Release IP even when endpoint deletion fails")
Fixes: #10065

Signed-off-by: Thomas Graf <thomas@cilium.io>
@tgraf tgraf added kind/bug This is a bug in the Cilium logic. pending-review release-note/bug This PR fixes an issue in a previous release of Cilium. labels Feb 5, 2020
@tgraf tgraf requested a review from a team February 5, 2020 15:37
@tgraf
Copy link
Member Author

tgraf commented Feb 5, 2020

test-me-please

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 44.712% when pulling a9e5656 on pr/tgraf/ipam-release-protection into 161fcd4 on master.

@aanm aanm added this to the 1.7 milestone Feb 5, 2020
@ungureanuvladvictor
Copy link
Member

test-me-please

@jrfastab
Copy link
Contributor

@raybejjani Looks like this is already backported? Did we just forget to update tags?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alive IP can be released if CNI DELETE is called with stale result
7 participants