Add option for daemon kube-apiserver access to bypass host firewall #40346

atykhyy · 2025-07-03T13:11:22Z

When L7 DNS proxy for nodes is enabled and kube-apiserver is a FQDN, Cilium agent deadlocks on restart, because k8sClientset's start hook which checks connection to kube-apiserver and verifies the version runs before the daemon bootstraps the DNS proxy with restored endpoints. This commit adds a daemon command-line option which configures the dialer of k8sClientset to bypass DNS proxy and host firewall in the same manner as DNS proxy's requests to remote DNS servers do. Since Cilium trusts both DNS and kube-apiserver responses, exempting the daemon's connections to kube-apiserver from host firewall does not degrade security.

Note: both DNS resolution of and actual connection to kube-apiserver are exempted. If only DNS resolution of kube-apiserver's address is exempted, DNS proxy would not create correct IP-based rules for its address, and if kube-apiserver's IP address changes while the Cilium daemon is restarting, kube-apiserver will not be accessible and the daemon will fail on startup.

Fixes: #35433

asauber

The justification for this implementation in #35433 looks reasonable to me. Approving for the configuration implementation. One comment about the readability.

daemon/cmd/datapath.go

bimmlerd

I'm OK with this approach, just some implementation nits

cc @joamaki and @joestringer as you've interacted with the issue this PR fixes

daemon/cmd/cells.go

daemon/cmd/daemon_main.go

atykhyy · 2025-07-10T18:47:36Z

Raising this question to top level to solicit reviewers' opinions:
I made this bypass an opt-in out of caution (i.e. not to break things elsewhere in Cilium which I don't understand that well). However, the bypass must be turned on whenever the following three conditions are met:

kube-apiserver address is an FQDN
host firewall is enabled
a Cilium clusterwide network policy targeting nodes has a L7 DNS rule (toy example below)

- egress:
  - toCIDR:
    - 168.63.129.16/32
    toPorts:
    - ports:
      - port: "53"
        protocol: ANY
      rules:
        dns:
        - matchPattern: '*'
  nodeSelector: {}

If these conditions are met and the bypass is not enabled, Cilium agent will break on restart (e.g. when it is upgraded), making the whole cluster dangerously unstable to unusable. My concern was that the bypass might have unintended bad effects in circumstances of which I am unaware, but perhaps it is misplaced? After all, DNS proxy has been decorating its dialer for remote DNS requests in the same way this PR does for k8s client dialer for years and years and has not caused problems. If so, the bypass ought to be enabled whenever host firewall is enabled (without an option to turn it off). An intermediate variant is to make the bypass opt-out rather than opt-in, so it can be turned off cheaply if it does happen to break something.

dylandreimerink · 2025-07-14T07:54:09Z

/test

atykhyy · 2025-07-14T11:09:48Z

Rebased changes on top of main to hopefully fix conformance tests, and made the bypass option on by default.

joestringer · 2025-07-16T18:14:05Z

/test

joestringer · 2025-07-18T19:46:39Z

@atykhyy no worries, I appreciate your patience with the review :) I'll trigger the tests & queue for automerge.

joestringer · 2025-07-18T19:46:45Z

/test

As we have seen suspicious connectivity problems with k8s apiserver since merging #40346 let's disable it for time being to see if it resolves problems in CI. Related: #40682 Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>

marseel · 2025-07-24T11:21:23Z

Hi all,
since merging this PR, we have observed various connectivity problems / delays with k8s apiserver: #40682 which started to happen right after merging this PR - even in case of CI runs without hostFirewall etc.

For now, I would like to propose skipping backport to v1.18: #40665 (review) as during the backport, we have also seen some connectivity problems to k8s apiserver.
Let's also disable it for main: #40691 to confirm if this PR is the root-cause or something else.
cc @atykhyy @joestringer

atykhyy · 2025-07-24T11:26:06Z

If the connection to kube-apiserver was getting dropped, as discussed in the CI flake issue, that should be visible in cilium-agent logs, shouldn't it? Unfortunately I don't have permissions to download log artifacts to check for this (assuming these logs are there).

joamaki · 2025-07-24T11:42:41Z

If the connection to kube-apiserver was getting dropped, as discussed in the CI flake issue, that should be visible in cilium-agent logs, shouldn't it? Unfortunately I don't have permissions to download log artifacts to check for this (assuming these logs are there).

Only in some cases do we log anything. If the connection fails on Watch calls then nothing is logged (things just get delayed). We have also seen #40687 where an Update failed due to broken connection. And we've seen "Heartbeat timed out, restarting client connections".

marseel · 2025-07-24T12:14:33Z

Unfortunately I don't have permissions to download log artifacts to check for this (assuming these logs are there).

I've included one of the sysdumps in the issue - previously I did not notice that it was not uploaded due to size limit. You should be able to download it from the issue now.

As Jussi says, there is not clear logs that would indicated broken watch. This is mostly based on timing when failures started to occur in CI and this PR was merged + area that this PR was changing + most likely area that was causing issues - it's a bit of a guess.

For now, I just want to disable it for a couple of days to check if it resolves issue, which would confirm that it was caused by this PR. If not, I will revert my PR that disables it.

atykhyy · 2025-07-24T12:54:46Z

By all means. But why would tagging k8s-clientset's outgoing packets with MagicMarkEgress make the connection flaky? Looking through that dump, it gets established without any delay worth speaking of:

2025-07-23T05:16:38.204463822Z time=2025-07-23T05:16:38.20428954Z level=debug msg="Skipped reading configuration file" error="Config File \"cilium\" Not Found in \"[/root]\""
# snip
2025-07-23T05:16:38.674517739Z time=2025-07-23T05:16:38.674446687Z level=info source=/go/src/github.com/cilium/cilium/pkg/k8s/client/cell.go:332 msg="Establishing connection to apiserver" module=agent.infra.k8s»
2025-07-23T05:16:38.681590372Z time=2025-07-23T05:16:38.681485483Z level=info source=/go/src/github.com/cilium/cilium/pkg/k8s/client/cell.go:359 msg="Connected to apiserver" module=agent.infra.k8s-client

The only thing that looks suspicious to me is that iptables rules disable conntrack when they see this mark. Could that be it? But then wouldn't DNS proxy that uses the same mark have the same sort of problem (it would manifest in a different place)?

joamaki · 2025-07-24T13:05:09Z

By all means. But why would tagging k8s-clientset's outgoing packets with MagicMarkEgress make the connection flaky? Looking through that dump, it gets established without any delay worth speaking of:
2025-07-23T05:16:38.204463822Z time=2025-07-23T05:16:38.20428954Z level=debug msg="Skipped reading configuration file" error="Config File \"cilium\" Not Found in \"[/root]\""
# snip
2025-07-23T05:16:38.674517739Z time=2025-07-23T05:16:38.674446687Z level=info source=/go/src/github.com/cilium/cilium/pkg/k8s/client/cell.go:332 msg="Establishing connection to apiserver" module=agent.infra.k8s»
2025-07-23T05:16:38.681590372Z time=2025-07-23T05:16:38.681485483Z level=info source=/go/src/github.com/cilium/cilium/pkg/k8s/client/cell.go:359 msg="Connected to apiserver" module=agent.infra.k8s-client
The only thing that looks suspicious to me is that iptables rules disable conntrack when they see this mark. Could that be it? But then wouldn't DNS proxy that uses the same mark have the same sort of problem (it would manifest in a different place)?

The initial connection attempt does succeed in all the failed tests, but that connection later breaks (couple minutes into it). Could well be that setting those iptables rules will make existing connections with that mark break?
Would this be easy to test?

The DNS proxy probably wouldn't be affected as it'd mostly be processing requests after the iptables rules have been set up.

As we have seen suspicious connectivity problems with k8s apiserver since merging #40346 let's disable it for time being to see if it resolves problems in CI. Related: #40682 Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>

atykhyy · 2025-07-24T13:33:44Z

Besides those iptables conttack rules there are also these routing table rules:

10:	from all fwmark 0xb00/0xf00 lookup 2005 proto kernel
# table 2005:
default via 10.244.0.14 dev cilium_host proto kernel mtu 1407 
10.244.0.14 dev cilium_host proto kernel scope link

I don't know enough say if creating these iptables or routing rules would break an existing TCP connection. If k8s client logged when it broke, one could see if the timing matches with what the datapath module is doing. Also, if rule changes caused 'old' packets to get dropped, wouldn't it take the TCP stack time to detect that the connection is no longer alive and signal an error on k8s client's socket? k8s-client-connection-keep-alive and k8s-client-connection-timeout are both set to 30s by default and in that CI run, add them together and it's one minute. Perhaps that's the source of the delay that is causing this near-startup flakiness? Would reducing these so that the broken connection is detected and recycled sooner help?

As an aside, @jrife commented in the CI issue that this flakiness is not a security problem. If there is no way to get rid of the flakiness, I'm perfectly fine with leaving the option added by this PR as opt-in, only to be used in the specific Cilium mode of operation where it is required (which is not yet documented anyway).

As we have seen suspicious connectivity problems with k8s apiserver since merging cilium#40346 let's disable it for time being to see if it resolves problems in CI. Related: cilium#40682 Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>

atykhyy requested review from a team as code owners July 3, 2025 13:11

atykhyy requested review from thorn3r, asauber and tommyp1ckles July 3, 2025 13:11

github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jul 3, 2025

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 3, 2025

atykhyy force-pushed the enable-k8s-host-firewall-bypass branch from 0124bcf to a4d46ed Compare July 3, 2025 13:56

asauber approved these changes Jul 4, 2025

View reviewed changes

daemon/cmd/datapath.go Outdated Show resolved Hide resolved

atykhyy requested a review from a team as a code owner July 4, 2025 05:56

atykhyy requested a review from bimmlerd July 4, 2025 05:56

bimmlerd requested changes Jul 4, 2025

View reviewed changes

daemon/cmd/cells.go Outdated Show resolved Hide resolved

daemon/cmd/daemon_main.go Outdated Show resolved Hide resolved

pchaigno added release-note/bug This PR fixes an issue in a previous release of Cilium. area/host-firewall Impacts the host firewall or the host endpoint. needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch labels Jul 7, 2025

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 7, 2025

bimmlerd approved these changes Jul 8, 2025

View reviewed changes

thorn3r approved these changes Jul 9, 2025

View reviewed changes

ldelossa approved these changes Jul 10, 2025

View reviewed changes

atykhyy force-pushed the enable-k8s-host-firewall-bypass branch from fafd8be to 21cee05 Compare July 14, 2025 10:56

atykhyy requested a review from a team as a code owner July 17, 2025 10:20

atykhyy requested a review from squeed July 17, 2025 10:20

atykhyy force-pushed the enable-k8s-host-firewall-bypass branch from b7f92e1 to 373021e Compare July 17, 2025 11:06

joestringer enabled auto-merge July 18, 2025 19:46

joestringer added this pull request to the merge queue Jul 18, 2025

Merged via the queue into cilium:main with commit cf3889a Jul 18, 2025
68 checks passed

maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Jul 18, 2025

YutaroHayakawa mentioned this pull request Jul 22, 2025

v1.18 Backports 2025-07-22 #40665

Merged

10 tasks

YutaroHayakawa added backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. and removed needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch labels Jul 22, 2025

marseel mentioned this pull request Jul 24, 2025

CI: check-log-errors/ - subsys=resolve-labels pod.core "[...]" not found when --enable-k8s-host-firewall-bypass=true #40682

Closed

marseel mentioned this pull request Jul 24, 2025

Disable host firewall bypass by default #40691

Merged

joestringer removed the backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. label Jul 24, 2025

This was referenced Aug 4, 2025

Restart k8s clientset connections once datapath is set up #40885

Draft

Don't enable host firewall bypass unless host firewall is enabled #40942

Merged

joestringer added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. and removed release-note/bug This PR fixes an issue in a previous release of Cilium. labels Aug 5, 2025

atykhyy mentioned this pull request Aug 7, 2025

Worker nodes lose DNS connectivity to AWS VPC DNS #40670

Closed

3 tasks

Add option for daemon kube-apiserver access to bypass host firewall #40346

Add option for daemon kube-apiserver access to bypass host firewall #40346

Uh oh!

Conversation

atykhyy commented Jul 3, 2025

Uh oh!

asauber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bimmlerd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

atykhyy commented Jul 10, 2025

Uh oh!

dylandreimerink commented Jul 14, 2025

Uh oh!

atykhyy commented Jul 14, 2025

Uh oh!

joestringer commented Jul 16, 2025

Uh oh!

joestringer commented Jul 18, 2025

Uh oh!

joestringer commented Jul 18, 2025

Uh oh!

Uh oh!

marseel commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atykhyy commented Jul 24, 2025

Uh oh!

joamaki commented Jul 24, 2025

Uh oh!

marseel commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atykhyy commented Jul 24, 2025

Uh oh!

joamaki commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atykhyy commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

marseel commented Jul 24, 2025 •

edited

Loading

marseel commented Jul 24, 2025 •

edited

Loading

joamaki commented Jul 24, 2025 •

edited

Loading

atykhyy commented Jul 24, 2025 •

edited

Loading