Skip to content

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Apr 20, 2020

See commit messages. Manually tested on EKS.

Fixes: #10810

Fix issue where cilium-health cannot healthcheck remote endpoint in ENI mode 

@christarazi christarazi requested a review from a team April 20, 2020 18:26
@christarazi christarazi requested review from a team as code owners April 20, 2020 18:26
@christarazi christarazi requested a review from a team April 20, 2020 18:26
@christarazi christarazi requested a review from a team as a code owner April 20, 2020 18:26
@christarazi christarazi requested a review from a team April 20, 2020 18:26
@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

@christarazi christarazi added area/cni Impacts the Container Networking Interface between Cilium and the orchestrator. area/eni Impacts ENI based IPAM. area/health Relates to the cilium-health component release-note/bug This PR fixes an issue in a previous release of Cilium. and removed dont-merge/needs-release-note labels Apr 20, 2020
@christarazi christarazi changed the title Pr/christarazi/implement eni cilium health Fix issue where cilium-health cannot healthcheck remote endpoint in ENI mode Apr 20, 2020
@christarazi christarazi changed the title Fix issue where cilium-health cannot healthcheck remote endpoint in ENI mode Fix cilium-health healthchecking remote endpoint in ENI mode Apr 20, 2020
@christarazi
Copy link
Member Author

test-me-please

@christarazi christarazi added kind/bug This is a bug in the Cilium logic. pending-review labels Apr 20, 2020
@coveralls
Copy link

coveralls commented Apr 20, 2020

Coverage Status

Coverage decreased (-0.1%) to 44.693% when pulling 5f10620 on christarazi:pr/christarazi/implement-eni-cilium-health into e7d4f5c on cilium:master.

@christarazi
Copy link
Member Author

Build went green on first try!

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

I have a few comments below around structuring the code to reduce interdependency and ideally reduce the number of different parts of the code that need to know this detail about the ENI implementation. Feel free to reach out if they don't make sense.

Main question I have is whether we could take this slightly further and further reduce the duplication of code. Secondary thought is it'd be nice to unit-test this as well, though I recognize that the existing code is lacking unit tests and this will be a bugfix that we likely intend to backport at least to v1.7 so there's also value in keeping the shear number of changes to a minimum.

None of my comments below block merging this implementation as it all looks good to me, but it'd be nice to evaluate whether there's some additional technical debt we could address while we have this area of the code fresh in our heads.

@@ -0,0 +1,158 @@
// Copyright 2020 Authors of Cilium
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is basically just a copy+paste / move from the previous code? For future review it's useful to have pure changes like this in a separate commit, just in case we have to bisect a problem later as it'll be easier to find bugs in smaller commits.

LMK if there are any specific changes you think I should look at in this package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is just copy-paste. Noted regarding future review, will keep in mind for next time.

@@ -353,3 +365,41 @@ func (d *Daemon) bootstrapIPAM() {
d.ipam = ipam.NewIPAM(d.datapath.LocalNodeAddressing(), option.Config, d.nodeDiscovery, d.k8sWatcher)
bootstrapStats.ipam.End(true)
}

func (d *Daemon) parseHealthEndpointInfo(result *ipam.AllocationResult) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance of further sharing this implementation with plugins/cilium-cni/eni.go:eniAdd()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could try the interfaces approach from above, where the ipam.AllocationResult object implements a bunch of getters for stuff like the Gateway IP, CIDRs, MAC and then a function in pkg/aws/eni/routing like NewRoutingInfo() accepts an interface for any object that implements these functions, and uses these to construct the RoutingInfo object, which the daemon can then just set like d.healthEndpointRouting = enirouting.NewRoutingInfo(result).

The cilium-cni code could then do a similar thing with models.IPAMAddressResponse (or probably more likely, a wrapper for this since the models code tends to be autogenerated), and furthermore you could also pass a dummy implementation into the enirouting package for testing purposes.

epMgr EndpointAdder,
proxy endpoint.EndpointProxy,
allocator cache.IdentityAllocator,
eniHealthRouting *enirouting.RoutingInfo) (*Client, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general cleanup, we can avoid directly depending on the eni package from this health endpoint package by defining an interface something like the following, then accepting that as a parameter into this function:

type routeConfiguration interface{
    ConfigureRoutes(ip string, mtu int, masquerade bool)
}

The other step of this would require instantiating an object to associate this function with, so basically adjust the Install() function below to take the RoutingInfo as a pointer receiver.

I'd expect this interface to be nil if ENI is disabled, or an instance of the RoutingInfo if ENI is enabled.

In this specific case I don't think it makes a huge difference, but over time this general approach helps to separate the code, helps unit testing, and avoids tying everything together too tightly with direct implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is good feedback. I'll try this out.

@christarazi
Copy link
Member Author

@joestringer Should this be a candidate for 1.7 backports?

@christarazi christarazi added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Apr 22, 2020
@joestringer
Copy link
Member

@christarazi I think that's reasonable, we'll support v1.7 for a while and risk should be fairly low. We'll be supporting users on v1.7 for a while so having this validation of underlying connectivity could be useful.

@christarazi christarazi added needs-backport/1.7 dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. and removed dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. labels Apr 22, 2020
Copy link
Member

@tgraf tgraf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome implementation!

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christarazi based on the conversation with Joe in this PR, I assume this PR is still wip? I'll mark has draft. Feel free to mark it as "Ready for review" and re-request for my and / or Joe's review.

@aanm aanm marked this pull request as draft April 24, 2020 10:37
@christarazi christarazi force-pushed the pr/christarazi/implement-eni-cilium-health branch from 43c3a60 to 415b217 Compare April 27, 2020 22:01
@christarazi
Copy link
Member Author

test-me-please

@christarazi
Copy link
Member Author

Assuming all tests pass, this PR can be merged. I will create a follow-up addressing the refactoring suggestions.

@christarazi christarazi removed the wip label Apr 27, 2020
@christarazi christarazi marked this pull request as ready for review April 27, 2020 22:05
In ENI mode, rules and routes are created for each endpoint (and
therefore pod) in order to direct traffic to the appropriate ENI device.
This logic was only running in the CNI plugin. Since cilium-health is
not a pod, its rules and routes needed for traffic to flow were missing.

This commit implements creating the necessary rules and routes for
cilium-health in ENI mode. This commit also unifies the installation
logic of the rules and routes in `pkg/aws/eni/routing`, allowing the CNI
plugin and the cilium-health creation code to share a common
implementation.

Fixes cilium#10810

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/implement-eni-cilium-health branch from 415b217 to 5f10620 Compare April 27, 2020 22:11
@christarazi
Copy link
Member Author

test-me-please

@christarazi christarazi added pending-review and removed dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. labels Apr 27, 2020
@christarazi
Copy link
Member Author

test-gke

@qmonnet qmonnet merged commit 26308b6 into cilium:master Apr 28, 2020
@christarazi christarazi deleted the pr/christarazi/implement-eni-cilium-health branch April 28, 2020 16:31
christarazi added a commit to christarazi/cilium that referenced this pull request Nov 10, 2020
This reverts commit 22ee8bb.

Since cilium#11073 has been merged, we no
longer need to skip this test when running on EKS.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi added a commit to christarazi/cilium that referenced this pull request Nov 10, 2020
This reverts commit 2409974.

Since cilium#11073 has been merged, we no
longer need to skip this test when running on EKS.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
nathanjsweet pushed a commit that referenced this pull request Nov 18, 2020
This reverts commit 22ee8bb.

Since #11073 has been merged, we no
longer need to skip this test when running on EKS.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
nathanjsweet pushed a commit that referenced this pull request Nov 18, 2020
This reverts commit 2409974.

Since #11073 has been merged, we no
longer need to skip this test when running on EKS.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
kaworu pushed a commit that referenced this pull request Nov 19, 2020
[ upstream commit 3ec53c5 ]

This reverts commit 22ee8bb.

Since #11073 has been merged, we no
longer need to skip this test when running on EKS.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
kaworu pushed a commit that referenced this pull request Nov 19, 2020
[ upstream commit 48bb789 ]

This reverts commit 2409974.

Since #11073 has been merged, we no
longer need to skip this test when running on EKS.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
joestringer pushed a commit that referenced this pull request Nov 21, 2020
[ upstream commit 3ec53c5 ]

This reverts commit 22ee8bb.

Since #11073 has been merged, we no
longer need to skip this test when running on EKS.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
joestringer pushed a commit that referenced this pull request Nov 21, 2020
[ upstream commit 48bb789 ]

This reverts commit 2409974.

Since #11073 has been merged, we no
longer need to skip this test when running on EKS.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cni Impacts the Container Networking Interface between Cilium and the orchestrator. area/eni Impacts ENI based IPAM. area/health Relates to the cilium-health component kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate cilium-health enablement in ENI mode
7 participants