-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix cilium-health healthchecking remote endpoint in ENI mode #11073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix cilium-health healthchecking remote endpoint in ENI mode #11073
Conversation
Please set the appropriate release note label. |
test-me-please |
Build went green on first try! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
I have a few comments below around structuring the code to reduce interdependency and ideally reduce the number of different parts of the code that need to know this detail about the ENI implementation. Feel free to reach out if they don't make sense.
Main question I have is whether we could take this slightly further and further reduce the duplication of code. Secondary thought is it'd be nice to unit-test this as well, though I recognize that the existing code is lacking unit tests and this will be a bugfix that we likely intend to backport at least to v1.7 so there's also value in keeping the shear number of changes to a minimum.
None of my comments below block merging this implementation as it all looks good to me, but it'd be nice to evaluate whether there's some additional technical debt we could address while we have this area of the code fresh in our heads.
@@ -0,0 +1,158 @@ | |||
// Copyright 2020 Authors of Cilium |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is basically just a copy+paste / move from the previous code? For future review it's useful to have pure changes like this in a separate commit, just in case we have to bisect a problem later as it'll be easier to find bugs in smaller commits.
LMK if there are any specific changes you think I should look at in this package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is just copy-paste. Noted regarding future review, will keep in mind for next time.
@@ -353,3 +365,41 @@ func (d *Daemon) bootstrapIPAM() { | |||
d.ipam = ipam.NewIPAM(d.datapath.LocalNodeAddressing(), option.Config, d.nodeDiscovery, d.k8sWatcher) | |||
bootstrapStats.ipam.End(true) | |||
} | |||
|
|||
func (d *Daemon) parseHealthEndpointInfo(result *ipam.AllocationResult) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any chance of further sharing this implementation with plugins/cilium-cni/eni.go:eniAdd()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could try the interfaces approach from above, where the ipam.AllocationResult
object implements a bunch of getters for stuff like the Gateway IP, CIDRs, MAC and then a function in pkg/aws/eni/routing
like NewRoutingInfo()
accepts an interface for any object that implements these functions, and uses these to construct the RoutingInfo object, which the daemon can then just set like d.healthEndpointRouting = enirouting.NewRoutingInfo(result)
.
The cilium-cni code could then do a similar thing with models.IPAMAddressResponse
(or probably more likely, a wrapper for this since the models
code tends to be autogenerated), and furthermore you could also pass a dummy implementation into the enirouting package for testing purposes.
epMgr EndpointAdder, | ||
proxy endpoint.EndpointProxy, | ||
allocator cache.IdentityAllocator, | ||
eniHealthRouting *enirouting.RoutingInfo) (*Client, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general cleanup, we can avoid directly depending on the eni package from this health endpoint package by defining an interface something like the following, then accepting that as a parameter into this function:
type routeConfiguration interface{
ConfigureRoutes(ip string, mtu int, masquerade bool)
}
The other step of this would require instantiating an object to associate this function with, so basically adjust the Install()
function below to take the RoutingInfo
as a pointer receiver.
I'd expect this interface to be nil
if ENI is disabled, or an instance of the RoutingInfo
if ENI is enabled.
In this specific case I don't think it makes a huge difference, but over time this general approach helps to separate the code, helps unit testing, and avoids tying everything together too tightly with direct implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is good feedback. I'll try this out.
@joestringer Should this be a candidate for 1.7 backports? |
@christarazi I think that's reasonable, we'll support v1.7 for a while and risk should be fairly low. We'll be supporting users on v1.7 for a while so having this validation of underlying connectivity could be useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome implementation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christarazi based on the conversation with Joe in this PR, I assume this PR is still wip? I'll mark has draft. Feel free to mark it as "Ready for review" and re-request for my and / or Joe's review.
43c3a60
to
415b217
Compare
test-me-please |
Assuming all tests pass, this PR can be merged. I will create a follow-up addressing the refactoring suggestions. |
In ENI mode, rules and routes are created for each endpoint (and therefore pod) in order to direct traffic to the appropriate ENI device. This logic was only running in the CNI plugin. Since cilium-health is not a pod, its rules and routes needed for traffic to flow were missing. This commit implements creating the necessary rules and routes for cilium-health in ENI mode. This commit also unifies the installation logic of the rules and routes in `pkg/aws/eni/routing`, allowing the CNI plugin and the cilium-health creation code to share a common implementation. Fixes cilium#10810 Signed-off-by: Chris Tarazi <chris@isovalent.com>
415b217
to
5f10620
Compare
test-me-please |
test-gke |
This reverts commit 22ee8bb. Since cilium#11073 has been merged, we no longer need to skip this test when running on EKS. Signed-off-by: Chris Tarazi <chris@isovalent.com>
This reverts commit 2409974. Since cilium#11073 has been merged, we no longer need to skip this test when running on EKS. Signed-off-by: Chris Tarazi <chris@isovalent.com>
See commit messages. Manually tested on EKS.
Fixes: #10810