Skip to content

Conversation

victorcq
Copy link

This fixes the racing condition during IPAM IP release process:
where an IP address that was marked for release and then reassigned back to the ENI can become stuck in a state where it's both in CiliumNode's spec.ipam.pool and in status.ipam.release-IPs with status "released". This prevents the IP from being assigned to new pods, leading to pods stuck in ContainerCreating state with "No more IPs available" errors.

Fixes: #39981

@victorcq victorcq requested a review from a team as a code owner June 13, 2025 02:43
@victorcq victorcq requested a review from pippolo84 June 13, 2025 02:43
@maintainer-s-little-helper
Copy link

Commit a82c11c does not match "(?m)^Signed-off-by:".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jun 13, 2025
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jun 13, 2025
@victorcq victorcq force-pushed the fix-ipam-ip-release-reassign-race branch from a82c11c to 4107a1e Compare June 13, 2025 04:14
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 13, 2025
@joestringer joestringer added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Jun 13, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 13, 2025
@joestringer
Copy link
Member

/test

@joestringer joestringer added the needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch label Jun 13, 2025
@victorcq victorcq force-pushed the fix-ipam-ip-release-reassign-race branch 2 times, most recently from 4ae5cc8 to f9b627a Compare June 16, 2025 08:50
@aanm
Copy link
Member

aanm commented Jun 16, 2025

/test

@victorcq victorcq force-pushed the fix-ipam-ip-release-reassign-race branch from f9b627a to 758fc0b Compare June 16, 2025 12:34
@victorcq
Copy link
Author

There were failed integration tests, but I'm not sure if they are really related to my change. Just rebased from main branch.

@joestringer
Copy link
Member

Fair enough, thanks for contributing. If it happens again could you also link to the build and highlight the failure symptom(s)? This can help others to review the output of those tests and either correlate with other issues others are facing or help to identify that the build is unrelated. You can also search the symptom in the "Issues" tab to see if someone else experiences the same problem.

As I see that you updated the PR several times, I'll also note that we do not require all PRs to be up-to-date with main before merging. It's not viable to rebase all the PRs with the rate of contribution we receive. Generally it's good for the PR to be up-to-date within the last week or so. Rebasing more frequently requires intervention to trigger the tests, while rebasing less frequently might start to create conflicts with the main branch.

I'll retrigger the tests.

@joestringer
Copy link
Member

/test

@victorcq
Copy link
Author

Thanks @joestringer ! Looks like all checks passed.

@aanm aanm enabled auto-merge June 17, 2025 09:54
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

The fix makes sense to me, but I'd left a feedback inline about the usage of time functions in the test.

auto-merge was automatically disabled June 21, 2025 09:42

Head branch was pushed to by a user without write access

@victorcq victorcq force-pushed the fix-ipam-ip-release-reassign-race branch from 4b30807 to 871ea9e Compare June 21, 2025 10:07
@joestringer
Copy link
Member

/test

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the second iteration!

Some final advice about the test and I think we can merge this! 💯

@victorcq victorcq force-pushed the fix-ipam-ip-release-reassign-race branch 2 times, most recently from e50d23f to 115f518 Compare June 24, 2025 10:36
@pippolo84
Copy link
Member

@victorcq changes LGTM, nice! Could you please squash the three commits into a single one? Thanks in advance! 🙏

@victorcq victorcq force-pushed the fix-ipam-ip-release-reassign-race branch from 115f518 to 4ab4303 Compare June 24, 2025 23:53
@victorcq
Copy link
Author

@victorcq changes LGTM, nice! Could you please squash the three commits into a single one? Thanks in advance! 🙏

👍 Done. Btw, I see this PR is tagged as needs-backport and we do need it in 1.17, do I need to do anything to make this happen?

@aanm aanm requested review from pippolo84 and joestringer June 25, 2025 08:41
@aanm aanm enabled auto-merge June 25, 2025 08:41
@aanm
Copy link
Member

aanm commented Jun 25, 2025

/test

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! ✅

@pippolo84
Copy link
Member

👍 Done. Btw, I see this PR is tagged as needs-backport and we do need it in 1.17, do I need to do anything to make this happen?

No need to do the backporting yourself, a backporter will take care of doing that (see https://docs.cilium.io/en/stable/contributing/development/reviewers_committers/duties/#backporters).

@joestringer joestringer removed their request for review June 25, 2025 15:35
@victorcq
Copy link
Author

@pippolo84 can you help to trigger those failed checks again? They don't really relate to my change. 🤞

Racing condition when IP reassigned back to ENI,
during IPAM release process.

Signed-off-by: Victor Chen <victor.chenq@gmail.com>
@pippolo84 pippolo84 force-pushed the fix-ipam-ip-release-reassign-race branch from 4ab4303 to 8a9a9a9 Compare June 26, 2025 17:09
@pippolo84
Copy link
Member

/test

@aanm aanm added this pull request to the merge queue Jun 26, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 26, 2025
Merged via the queue into cilium:main with commit 7fbf7c6 Jun 26, 2025
68 checks passed
@tklauser tklauser mentioned this pull request Jul 1, 2025
4 tasks
@tklauser tklauser added backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. and removed needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch labels Jul 1, 2025
@github-actions github-actions bot added backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. and removed backport-pending/1.17 The backport for Cilium 1.17.x for this PR is in progress. labels Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.17 The backport for Cilium 1.17.x for this PR is done. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS ENI IP addresses can get stuck released status, preventing allocation to pods
6 participants