eni: Assign primary IP to support multiple VPC CIDRs #15453

gandro · 2021-03-24T18:00:21Z

This PR contains two fixes to support multiple AWS VPCs CIDRs in Cilium ENI IPAM mode.

The first fix is to assign the primary IP of each ENI to its corresponding Linux device. This ensures SNAT works correctly on the secondary interfaces. Before this change, SNAT on the secondary ENI interfaces would pick the node IP as the source IP address, which subsequently caused the packet to be dropped by the fabric. To facilitate this change, the primary IP is excluded from the allocation pool in the operator. This is similar to how AWS CNI treats the primary IP.

The second fix is around the cleanup of the additional rules created for secondary CIDRs (i.e. a follow-up to #15303)

The approach taken here has been validated manually by myself and by a community user.

pkg/datapath/linux/routing/routing.go

pkg/ipam/crd_eni.go

twpayne · 2021-03-31T09:08:55Z

pkg/ipam/crd_eni.go

+	}
+
+	newENIs, _ := diffResources(oldNode, newNode)
+	go setupENI(newENIs, expectedENIByMac, eniByName, firstInterfaceIndex, mtuConfig)


There's no synchronization around this goroutine. In theory this might cause problems if updateENIRulesAndRoutes is called multiple times in quick succession, which could cause multiple setupENI goroutines to run concurrently, possibly stepping on each other's toes. In practice is should be OK though. 🤞

This is a great point. I have reworks the PR to only consider newly added interfaces. This should ensure that all possibly concurrent go routine only try to configure a distinct subset of newly added ENIs.

In addition to this change, I also increased the wait time and introduced exponential backoff.

In CRD-backed IPAM modes such as ENI mode, some IPs were recently removed from the IPAM pool [1]. In user environments where Cilium was running a version prior to [1], it is possible for endpoints to be assigned "unavailable" IPs. Upon upgrade (to a version which includes [1]), endpoint restoration will fail with [2]. In order to workaround this failure and not disrupt the upgrade, this change introduces a hidden flag (`--bypass-ip-availability-upon-restore`) which will inform Cilium to continue on if the restored endpoint's IP is not available for reallocation, bypassing the specific error "IP is not available". Other errors will not be bypassed, in order to reduce the scope of this stop-gap solution. With the flag set, restored endpoints which had "unavailable" IPs will keep them. Any new endpoints / pods will be assigned fresh, valid IPs from the pool. This flag is only meant to be enabled with CRD-backed IPAM modes such as ENI mode. The reason is because of the change described in [1], where the primary ENI IP was removed from the IPAM pool. In any other mode that this flag is enabled in, the user is warned that the flag is not intended for other modes and will have no effect. This patch is intended to be reverted in the future, as this stop-gap solution will no longer be required, as users of Cilium don't upgrade from versions prior to [1]. I propose that we revert this in the following release that this patch makes it in (N+1). How was this tested? 1) Deployed a Cilium version that doesn't include [1] on EKS cluster 2) Created a Deployment object which I scaled to max out the ENI IPs, such that at least one pod is assigned an "unavailable" IP 3) Upgraded Cilium to a version which does include [1] and observe [2] failures 4) Reset cluster back to state from (2) 5) Upgrade Cilium to the version that contains this commit 6) Observe log msgs from this commit and endpoint restoration succeeding 7) Scale Deployment to 0 and back up, to restart all pods 8) Observe that they all get fresh IPs and none of the "unavailable" IPs [1]: cilium#15453 [2]: ``` { "time":"2021-09-20T16:57:00.400086481Z", "level":"WARN", "origin":"cilium.io/agent", "message":"Unable to restore endpoint, ignoring", "params":{ "endpointID":"992", "error":"Failed to re-allocate IP of endpoint: unable to reallocate 10.0.133.193 IPv4 address: IP 10.0.133.193 is not available", "k8sPodName":"default/pod-1", "subsys":"daemon" } } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>

In CRD-backed IPAM modes such as ENI mode, some IPs were recently removed from the IPAM pool [1]. In user environments where Cilium was running a version prior to [1], it is possible for endpoints to be assigned "unavailable" IPs. Upon upgrade (to a version which includes [1]), endpoint restoration will fail with [2]. In order to workaround this failure and not disrupt the upgrade, this change introduces a hidden flag (`--bypass-ip-availability-upon-restore`) which will inform Cilium to continue on if the restored endpoint's IP is not available for reallocation, bypassing the specific error "IP is not available". Other errors will not be bypassed, in order to reduce the scope of this stop-gap solution. With the flag set, restored endpoints which had "unavailable" IPs will keep them. Any new endpoints / pods will be assigned fresh, valid IPs from the pool. This flag is only meant to be enabled with CRD-backed IPAM modes such as ENI mode. The reason is because of the change described in [1], where the primary ENI IP was removed from the IPAM pool. In any other mode that this flag is enabled in, the user is warned that the flag is not intended for other modes and will have no effect. This patch is intended to be reverted in the future, as this stop-gap solution will no longer be required, as users of Cilium don't upgrade from versions prior to [1]. I propose that we revert this in the following release that this patch makes it in (N+1). How was this tested? 1) Deployed a Cilium version that doesn't include [1] on EKS cluster 2) Created a Deployment object which I scaled to max out the ENI IPs, such that at least one pod is assigned an "unavailable" IP 3) Upgraded Cilium to a version which does include [1] and observe [2] failures 4) Reset cluster back to state from (2) 5) Upgrade Cilium to the version that contains this commit 6) Observe log msgs from this commit and endpoint restoration succeeding 7) Scale Deployment to 0 and back up, to restart all pods 8) Observe that they all get fresh IPs and none of the "unavailable" IPs [1]: #15453 [2]: ``` { "time":"2021-09-20T16:57:00.400086481Z", "level":"WARN", "origin":"cilium.io/agent", "message":"Unable to restore endpoint, ignoring", "params":{ "endpointID":"992", "error":"Failed to re-allocate IP of endpoint: unable to reallocate 10.0.133.193 IPv4 address: IP 10.0.133.193 is not available", "k8sPodName":"default/pod-1", "subsys":"daemon" } } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>

[ upstream commit baa9e9a ] In CRD-backed IPAM modes such as ENI mode, some IPs were recently removed from the IPAM pool [1]. In user environments where Cilium was running a version prior to [1], it is possible for endpoints to be assigned "unavailable" IPs. Upon upgrade (to a version which includes [1]), endpoint restoration will fail with [2]. In order to workaround this failure and not disrupt the upgrade, this change introduces a hidden flag (`--bypass-ip-availability-upon-restore`) which will inform Cilium to continue on if the restored endpoint's IP is not available for reallocation, bypassing the specific error "IP is not available". Other errors will not be bypassed, in order to reduce the scope of this stop-gap solution. With the flag set, restored endpoints which had "unavailable" IPs will keep them. Any new endpoints / pods will be assigned fresh, valid IPs from the pool. This flag is only meant to be enabled with CRD-backed IPAM modes such as ENI mode. The reason is because of the change described in [1], where the primary ENI IP was removed from the IPAM pool. In any other mode that this flag is enabled in, the user is warned that the flag is not intended for other modes and will have no effect. This patch is intended to be reverted in the future, as this stop-gap solution will no longer be required, as users of Cilium don't upgrade from versions prior to [1]. I propose that we revert this in the following release that this patch makes it in (N+1). How was this tested? 1) Deployed a Cilium version that doesn't include [1] on EKS cluster 2) Created a Deployment object which I scaled to max out the ENI IPs, such that at least one pod is assigned an "unavailable" IP 3) Upgraded Cilium to a version which does include [1] and observe [2] failures 4) Reset cluster back to state from (2) 5) Upgrade Cilium to the version that contains this commit 6) Observe log msgs from this commit and endpoint restoration succeeding 7) Scale Deployment to 0 and back up, to restart all pods 8) Observe that they all get fresh IPs and none of the "unavailable" IPs [1]: cilium#15453 [2]: ``` { "time":"2021-09-20T16:57:00.400086481Z", "level":"WARN", "origin":"cilium.io/agent", "message":"Unable to restore endpoint, ignoring", "params":{ "endpointID":"992", "error":"Failed to re-allocate IP of endpoint: unable to reallocate 10.0.133.193 IPv4 address: IP 10.0.133.193 is not available", "k8sPodName":"default/pod-1", "subsys":"daemon" } } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>

[ upstream commit baa9e9a ] In CRD-backed IPAM modes such as ENI mode, some IPs were recently removed from the IPAM pool [1]. In user environments where Cilium was running a version prior to [1], it is possible for endpoints to be assigned "unavailable" IPs. Upon upgrade (to a version which includes [1]), endpoint restoration will fail with [2]. In order to workaround this failure and not disrupt the upgrade, this change introduces a hidden flag (`--bypass-ip-availability-upon-restore`) which will inform Cilium to continue on if the restored endpoint's IP is not available for reallocation, bypassing the specific error "IP is not available". Other errors will not be bypassed, in order to reduce the scope of this stop-gap solution. With the flag set, restored endpoints which had "unavailable" IPs will keep them. Any new endpoints / pods will be assigned fresh, valid IPs from the pool. This flag is only meant to be enabled with CRD-backed IPAM modes such as ENI mode. The reason is because of the change described in [1], where the primary ENI IP was removed from the IPAM pool. In any other mode that this flag is enabled in, the user is warned that the flag is not intended for other modes and will have no effect. This patch is intended to be reverted in the future, as this stop-gap solution will no longer be required, as users of Cilium don't upgrade from versions prior to [1]. I propose that we revert this in the following release that this patch makes it in (N+1). How was this tested? 1) Deployed a Cilium version that doesn't include [1] on EKS cluster 2) Created a Deployment object which I scaled to max out the ENI IPs, such that at least one pod is assigned an "unavailable" IP 3) Upgraded Cilium to a version which does include [1] and observe [2] failures 4) Reset cluster back to state from (2) 5) Upgrade Cilium to the version that contains this commit 6) Observe log msgs from this commit and endpoint restoration succeeding 7) Scale Deployment to 0 and back up, to restart all pods 8) Observe that they all get fresh IPs and none of the "unavailable" IPs [1]: #15453 [2]: ``` { "time":"2021-09-20T16:57:00.400086481Z", "level":"WARN", "origin":"cilium.io/agent", "message":"Unable to restore endpoint, ignoring", "params":{ "endpointID":"992", "error":"Failed to re-allocate IP of endpoint: unable to reallocate 10.0.133.193 IPv4 address: IP 10.0.133.193 is not available", "k8sPodName":"default/pod-1", "subsys":"daemon" } } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com>

[ upstream commit baa9e9a ] In CRD-backed IPAM modes such as ENI mode, some IPs were recently removed from the IPAM pool [1]. In user environments where Cilium was running a version prior to [1], it is possible for endpoints to be assigned "unavailable" IPs. Upon upgrade (to a version which includes [1]), endpoint restoration will fail with [2]. In order to workaround this failure and not disrupt the upgrade, this change introduces a hidden flag (`--bypass-ip-availability-upon-restore`) which will inform Cilium to continue on if the restored endpoint's IP is not available for reallocation, bypassing the specific error "IP is not available". Other errors will not be bypassed, in order to reduce the scope of this stop-gap solution. With the flag set, restored endpoints which had "unavailable" IPs will keep them. Any new endpoints / pods will be assigned fresh, valid IPs from the pool. This flag is only meant to be enabled with CRD-backed IPAM modes such as ENI mode. The reason is because of the change described in [1], where the primary ENI IP was removed from the IPAM pool. In any other mode that this flag is enabled in, the user is warned that the flag is not intended for other modes and will have no effect. This patch is intended to be reverted in the future, as this stop-gap solution will no longer be required, as users of Cilium don't upgrade from versions prior to [1]. I propose that we revert this in the following release that this patch makes it in (N+1). How was this tested? 1) Deployed a Cilium version that doesn't include [1] on EKS cluster 2) Created a Deployment object which I scaled to max out the ENI IPs, such that at least one pod is assigned an "unavailable" IP 3) Upgraded Cilium to a version which does include [1] and observe [2] failures 4) Reset cluster back to state from (2) 5) Upgrade Cilium to the version that contains this commit 6) Observe log msgs from this commit and endpoint restoration succeeding 7) Scale Deployment to 0 and back up, to restart all pods 8) Observe that they all get fresh IPs and none of the "unavailable" IPs [1]: #15453 [2]: ``` { "time":"2021-09-20T16:57:00.400086481Z", "level":"WARN", "origin":"cilium.io/agent", "message":"Unable to restore endpoint, ignoring", "params":{ "endpointID":"992", "error":"Failed to re-allocate IP of endpoint: unable to reallocate 10.0.133.193 IPv4 address: IP 10.0.133.193 is not available", "k8sPodName":"default/pod-1", "subsys":"daemon" } } ``` Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Glib Smaga <code@gsmaga.com>

gandro added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Mar 24, 2021

gandro force-pushed the pr/gandro/eni-hotfix-210325-1 branch 9 times, most recently from 0c3bd6e to e362a33 Compare March 25, 2021 14:15

twpayne reviewed Mar 25, 2021

View reviewed changes

pkg/datapath/linux/routing/routing.go Show resolved Hide resolved

twpayne mentioned this pull request Mar 30, 2021

Support multiple natively-routed CIDRs in ENI mode #15215

Closed

twpayne reviewed Mar 31, 2021

View reviewed changes

gandro force-pushed the pr/gandro/eni-hotfix-210325-1 branch from e362a33 to ef2254f Compare March 31, 2021 15:28

gandro added area/eni Impacts ENI based IPAM. needs-backport/1.8 labels Mar 31, 2021

gandro changed the title ~~eni: Various fixes to support multiple CIDRs~~ eni: Assign primary IP to support multiple VPC CIDRs Mar 31, 2021

gandro force-pushed the pr/gandro/eni-hotfix-210325-1 branch from ef2254f to 74a68ac Compare March 31, 2021 15:52

gandro requested a review from twpayne March 31, 2021 16:09

gandro marked this pull request as ready for review March 31, 2021 16:09

gandro requested a review from a team March 31, 2021 16:09

gandro requested a review from a team as a code owner March 31, 2021 16:09

gandro requested a review from a team March 31, 2021 16:09

gandro requested review from a team as code owners March 31, 2021 16:09

gandro requested review from a team and nebril March 31, 2021 16:09

errordeveloper mentioned this pull request Apr 9, 2021

v1.8 backports 2021-04-09 #15624

Merged

errordeveloper added backport-pending/1.9 and removed needs-backport/1.9 labels Apr 9, 2021

errordeveloper mentioned this pull request Apr 9, 2021

v1.9 backports 2021-04-09 #15627

Merged

qmonnet added backport-done/1.9 and removed backport-pending/1.9 labels Apr 14, 2021

aanm added backport-done/1.8 and removed backport-pending/1.8 labels Apr 17, 2021

This was referenced Apr 20, 2021

Prepare for release v1.8.9 #15794

Merged

Prepare for release v1.9.6 #15795

Merged

This was referenced Apr 28, 2021

Prepare for release v1.10.0-rc1 #15896

Closed

Prepare for release v1.10.0-rc1 #15897

Merged

errordeveloper mentioned this pull request Jun 7, 2021

connectivity issues with multiple ipv4 CIDR on AWS ENI #14411

Closed

christarazi mentioned this pull request Sep 28, 2021

daemon, ipam, option: Introduce ability to bypass IP availability error #17492

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eni: Assign primary IP to support multiple VPC CIDRs #15453

eni: Assign primary IP to support multiple VPC CIDRs #15453

Uh oh!

gandro commented Mar 24, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

twpayne Mar 31, 2021

Uh oh!

gandro Mar 31, 2021

Uh oh!

Uh oh!

eni: Assign primary IP to support multiple VPC CIDRs #15453

eni: Assign primary IP to support multiple VPC CIDRs #15453

Uh oh!

Conversation

gandro commented Mar 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

twpayne Mar 31, 2021

Choose a reason for hiding this comment

Uh oh!

gandro Mar 31, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gandro commented Mar 24, 2021 •

edited

Loading