Skip to content

Conversation

hemanthmalla
Copy link
Member

@hemanthmalla hemanthmalla commented Jan 12, 2022

AWS introduced support for assigning prefixes to EC2 network interfaces - prefix delegation (pd) . Some of the benefits of using PD are:

With aws-enable-prefix-delegation flag enabled operator can now allocate /28 prefixes to resolve deficit on nodes. Once allocated, operator will update the cilium node object with the corresponding 16 IPs. Agent will use these IPs just like private secondary IPs.

pd_arch

Refer to the RFC for additional details discussing the the design and key decisions.

Fixes: #16987

This PR needs changes from #18557 for running the e2e test.

Limitations :

  • PD is only supported on VMs based on nitro hypervisor.
  • Currently AWS only supports assigning fixed size prefixes. /28 for IPv4 and /80 for IPv6

Non goals :

  • IPv6 prefix support
  • Releasing unused prefixes when aws-excess-ip-release is enabled

Todo :

  • e2e tests for PD
  • Unit test for PD
  • Validate disabling prefix delegation works for existing PD nodes

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 12, 2022
@hemanthmalla hemanthmalla force-pushed the hemanthmalla/eni_pd branch 4 times, most recently from 2ec8e60 to 81abb2a Compare January 21, 2022 13:54
@gandro gandro added area/eni Impacts ENI based IPAM. release-note/major This PR introduces major new functionality to Cilium. labels Jan 25, 2022
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 25, 2022
@dzoeteman
Copy link

Hi @hemanthmalla, can I assist in anyway with this PR?
I noticed aws-cni chaining mode is also affected by this (makes sense), and I would assume your code would fix it too for that use case.

@hemanthmalla
Copy link
Member Author

Hi @dzoeteman, I'm waiting on ENI e2e changes from #18557 to update this PR with e2e tests for prefix delegation.

@hemanthmalla
Copy link
Member Author

Removed e2e tests and marking as ready for review. Will follow up with another PR for e2e tests once the ENI e2e PR is merged.

@hemanthmalla hemanthmalla marked this pull request as ready for review March 3, 2022 16:13
@hemanthmalla hemanthmalla requested review from a team March 3, 2022 16:13
@hemanthmalla hemanthmalla requested review from a team as code owners March 3, 2022 16:13
@qmonnet
Copy link
Member

qmonnet commented Mar 28, 2022

Let's track the flakes anyway.

/mlh new-flake Cilium-PR-K8s-1.23-kernel-net-next [didn't work 🤔]
New issue to track Cilium-PR-K8s-1.23-kernel-net-next: #19233.

Cilium-PR-K8s-1.21-kernel-5.4 failure is #17069 (same test as in #17373, marked as a duplicate of the former).

@qmonnet
Copy link
Member

qmonnet commented Mar 28, 2022

/test-1.23-net-next
As Sebastian said the failure shouldn't be related to the PR, but let's make sure this was a flake.

Job 'Cilium-PR-K8s-1.23-kernel-net-next' failed:

Click to show.

Test Name

K8sServicesTest Checks N/S loadbalancing Tests security id propagation in N/S LB requests fwd-ed over tunnel

Failure Output

FAIL: k8s3 host unexpectedly connected to service "http://192.168.56.11:31094", it should fail

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.23-kernel-net-next so I can create one.

@qmonnet qmonnet removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 28, 2022
@qmonnet
Copy link
Member

qmonnet commented Mar 28, 2022

Same issue occurred :/ I'd like to understand where it comes from before we merge this PR.

@hemanthmalla
Copy link
Member Author

@qmonnet possibly because my branch doesn't have commits from #19061 ?
Let's try again with a rebase ? Esp. because PD related code shouldn't affect this at all ?

Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com>
AWS recently introduced support to assign IP prefixes to ENIs on nitro
instances.Adding support for this feature will allow for running roughly
16x pods. This commit adds necessary changes to the operator to allocate
IPv4 prefixes and update the cilium node CRD with corresponding IPs from
those prefixes.

Fixes: cilium#16987

Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com>
Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com>
@gandro
Copy link
Member

gandro commented Mar 28, 2022

/test

Job 'Cilium-PR-K8s-GKE' failed:

Click to show.

Test Name

K8sDatapathConfig DirectRouting Check connectivity with direct routing and endpointRoutes

Failure Output

FAIL: Connectivity test between nodes failed

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create one.

@qmonnet
Copy link
Member

qmonnet commented Mar 28, 2022

@qmonnet possibly because my branch doesn't have commits from #19061

Ah, thanks, I was looking for something like this 👍

? Let's try again with a rebase ? Esp. because PD related code shouldn't affect this at all ?

Yes agreed, I was a bit reluctant to merge because of the recurring issue, but it's true it's unlikely from your code change. I see you're rebased already, let's give it another try [oh but Sebastian was faster than me!]. Thanks!

@hemanthmalla
Copy link
Member Author

Cilium-PR-K8s-1.23-kernel-net-next succeeded now, but looks like we hit another flake #17628 for Cilium-PR-K8s-GKE

@hemanthmalla
Copy link
Member Author

@gandro / @qmonnet safe to ignore the GKE failure ?

@sayboras
Copy link
Member

/test-gke

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 29, 2022
@qmonnet qmonnet merged commit 6947ab1 into cilium:master Mar 29, 2022
@sayboras
Copy link
Member

safe to ignore the GKE failure ?

Yup, it was successful before, so I triggered re-run just to verify other changes from master branch (as jenkins is running with merge commit).

gandro added a commit to gandro/cilium that referenced this pull request Mar 30, 2022
PR cilium#18463 changed the signature of the CreateNetworkInterface and
NewNodeManager methods. This change was merged last week.  PR cilium#19096
added a unit test in TestNodeManagerENIExcludeInterfaceTags which calls
those methods with the old signature.

Because the latter PR was not rebased after a merge, we accidentally
merged a non-compilable unit test into master, even though CI was green
(as CI was run before the signature was changed).

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
pchaigno pushed a commit that referenced this pull request Mar 30, 2022
PR #18463 changed the signature of the CreateNetworkInterface and
NewNodeManager methods. This change was merged last week.  PR #19096
added a unit test in TestNodeManagerENIExcludeInterfaceTags which calls
those methods with the old signature.

Because the latter PR was not rebased after a merge, we accidentally
merged a non-compilable unit test into master, even though CI was green
(as CI was run before the signature was changed).

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@mKeRix mKeRix mentioned this pull request May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/eni Impacts ENI based IPAM. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/major This PR introduces major new functionality to Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENI IPAM mode should add support for assigning IP prefixes to EC2 instance
7 participants