-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add option for daemon kube-apiserver access to bypass host firewall #40346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for daemon kube-apiserver access to bypass host firewall #40346
Conversation
0124bcf
to
a4d46ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The justification for this implementation in #35433 looks reasonable to me. Approving for the configuration implementation. One comment about the readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this approach, just some implementation nits
cc @joamaki and @joestringer as you've interacted with the issue this PR fixes
Raising this question to top level to solicit reviewers' opinions:
- egress:
- toCIDR:
- 168.63.129.16/32
toPorts:
- ports:
- port: "53"
protocol: ANY
rules:
dns:
- matchPattern: '*'
nodeSelector: {} If these conditions are met and the bypass is not enabled, Cilium agent will break on restart (e.g. when it is upgraded), making the whole cluster dangerously unstable to unusable. My concern was that the bypass might have unintended bad effects in circumstances of which I am unaware, but perhaps it is misplaced? After all, DNS proxy has been decorating its dialer for remote DNS requests in the same way this PR does for k8s client dialer for years and years and has not caused problems. If so, the bypass ought to be enabled whenever host firewall is enabled (without an option to turn it off). An intermediate variant is to make the bypass opt-out rather than opt-in, so it can be turned off cheaply if it does happen to break something. |
/test |
fafd8be
to
21cee05
Compare
Rebased changes on top of main to hopefully fix conformance tests, and made the bypass option on by default. |
/test |
b7f92e1
to
373021e
Compare
@atykhyy no worries, I appreciate your patience with the review :) I'll trigger the tests & queue for automerge. |
/test |
Hi all, For now, I would like to propose skipping backport to v1.18: #40665 (review) as during the backport, we have also seen some connectivity problems to k8s apiserver. |
If the connection to kube-apiserver was getting dropped, as discussed in the CI flake issue, that should be visible in cilium-agent logs, shouldn't it? Unfortunately I don't have permissions to download log artifacts to check for this (assuming these logs are there). |
Only in some cases do we log anything. If the connection fails on Watch calls then nothing is logged (things just get delayed). We have also seen #40687 where an Update failed due to broken connection. And we've seen "Heartbeat timed out, restarting client connections". |
I've included one of the sysdumps in the issue - previously I did not notice that it was not uploaded due to size limit. You should be able to download it from the issue now. As Jussi says, there is not clear logs that would indicated broken watch. This is mostly based on timing when failures started to occur in CI and this PR was merged + area that this PR was changing + most likely area that was causing issues - it's a bit of a guess. For now, I just want to disable it for a couple of days to check if it resolves issue, which would confirm that it was caused by this PR. If not, I will revert my PR that disables it. |
By all means. But why would tagging k8s-clientset's outgoing packets with
The only thing that looks suspicious to me is that iptables rules disable conntrack when they see this mark. Could that be it? But then wouldn't DNS proxy that uses the same mark have the same sort of problem (it would manifest in a different place)? |
The initial connection attempt does succeed in all the failed tests, but that connection later breaks (couple minutes into it). Could well be that setting those iptables rules will make existing connections with that mark break? The DNS proxy probably wouldn't be affected as it'd mostly be processing requests after the iptables rules have been set up. |
Besides those iptables conttack rules there are also these routing table rules:
I don't know enough say if creating these iptables or routing rules would break an existing TCP connection. If k8s client logged when it broke, one could see if the timing matches with what the datapath module is doing. Also, if rule changes caused 'old' packets to get dropped, wouldn't it take the TCP stack time to detect that the connection is no longer alive and signal an error on k8s client's socket? As an aside, @jrife commented in the CI issue that this flakiness is not a security problem. If there is no way to get rid of the flakiness, I'm perfectly fine with leaving the option added by this PR as opt-in, only to be used in the specific Cilium mode of operation where it is required (which is not yet documented anyway). |
As we have seen suspicious connectivity problems with k8s apiserver since merging cilium#40346 let's disable it for time being to see if it resolves problems in CI. Related: cilium#40682 Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
When L7 DNS proxy for nodes is enabled and kube-apiserver is a FQDN, Cilium agent deadlocks on restart, because k8sClientset's start hook which checks connection to kube-apiserver and verifies the version runs before the daemon bootstraps the DNS proxy with restored endpoints. This commit adds a daemon command-line option which configures the dialer of k8sClientset to bypass DNS proxy and host firewall in the same manner as DNS proxy's requests to remote DNS servers do. Since Cilium trusts both DNS and kube-apiserver responses, exempting the daemon's connections to kube-apiserver from host firewall does not degrade security.
Note: both DNS resolution of and actual connection to kube-apiserver are exempted. If only DNS resolution of kube-apiserver's address is exempted, DNS proxy would not create correct IP-based rules for its address, and if kube-apiserver's IP address changes while the Cilium daemon is restarting, kube-apiserver will not be accessible and the daemon will fail on startup.
Fixes: #35433