-
Notifications
You must be signed in to change notification settings - Fork 4.7k
WIP Open ELB to kops-controller port when using it for internal API #10142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johngmyers The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
dc99c6b
to
9e82908
Compare
9e82908
to
280432f
Compare
/retest |
I built kops including this patch, and it worked as intended: my worker machines were able to contact kops-controller on port 3988 and finish their bootstrap procedure. |
I spoke too soon. It seemed that this worked earlier, but then failed again in my current test. What I observed was that opening up the ELB's security group to allow ingress on port 3988 from the "nodes" security group wasn't good enough. Opening ingress on port 3988 from everywhere did work. I don't understand why this is so. The source machine—one of my worker machines—is a member of the "nodes" security group. It also a member of another security group, but—not that I expected this to work—when I also tried allowing ingress to the ELB from that other security group, it didn't change the outcome. Only opening up the ELB to ingress from all sources has worked so far. I can't tell if there's some SNAT going on here that's confusing AWS's ability to tell that the incoming traffic (that is, from a worker machine to the ELB) is coming from a blessed security group. |
I enabled access logs on the ELB, but the client IP addresses only show the IP addresses of the ELB listeners, which isn't helpful. There are a few clients with an address like 18.188.58.156 which does not look like one from any of my VPCs. I tried replacing the ELB security group rule allowing ingress from anywhere with one allowing ingress from the same ELB security group itself, just to see if this firewall enforcement matches what the ELB logs show. That didn't work, though: traffic was still blocked from the worker machines. I'm going to take a break and get some sleep, and see if reasonable explanation comes to me—as usually occurs the moment I turn off the computer. |
Your problem seems to be that the ELB is public and the SGs will not help too much with that. Probably |
FromPort: fi.Int64(wellknownports.KopsControllerPort), | ||
Protocol: fi.String("tcp"), | ||
SecurityGroup: lbSG, | ||
SourceGroup: nodeGroup.Task, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check if the API server load balancer is public, and if so, use a more lax source range here? As @hakman realized, for a public ELB, the traffic comes in from the nodes via a NAT gateway with a source address that's not even within any of the VPC's CIDR blocks. (The NAT gateways each have a private IP address within the VPC, but the corresponding "Elastic IP address" is outside the VPC.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer not to expose the port externally if possible. While the authentication is strong, there are denial of service considerations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per #10139 (comment), I now think we're fixing the wrong problem here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If someone asks for Public LB and UseForInternalApi, I think should allow access from 0.0.0.0/0, same as for API.
Or maybe not allow UseForInternalApi in this case, as it won't work anyway.
I don't have a strong preference in general here, so feel free to ignore my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure I understand your second proposal: Could kops reject "useForInternalApi" as invalid when the load balancer is public? I think that's the best choice, if it's feasible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked the option—clearly, as I tried to enable it—for accessing the Kubernetes API servers, not really thinking about whether the load balancer was public or not. At that point, though, I didn't realize that it would be used for other things like this node bootstrapping.
And did this other use start in kops 1.19? I didn't experience this problem with kops 1.18.2.
Looks like the choices might be to expose the port externally, pay for a second, internal, load balancer, or to use a dns-controller domain regardless of the |
Going with a separate dns-controller-managed domain instead, in #10239 |
Fixes #10139