-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
This tracks a group of issues related to netkit that share a similar root cause and discussion around it.
- In netkit mode with networkpolicy drops k3s kubelet Liveness/Readiness probe packets but veth mode did not #34042 a user reported that liveness/readiness probes stopped working after applying
NetworkPolicy
to aPod
when usingnetkit
and endpoint routes. The same configuration usingveth
did not have this problem. Some investigation turned up that packets coming from the host namespace get misidentified with theworld
identity instead ofhost
. WithNetworkPolicies
applied to the endpoint this leads to packets from liveness/readiness probes getting dropped. - In Using Cilium V1.16.0 External Access with DNS-Based Policies is not working properly #33875 a user reported that running
curl -I -s https://api.github.com/
from aPod
hangs when usingnetkit
and endpoint routes after applying aCiliumNetworkPolicy
which specifies.spec.egress.toFQDNs
. The same configuration usingveth
or with endpoint routes disabled did not have this problem. Some investigation turned up a similar symptoms, with reply DNS traffic returning from the proxy taking on theworld
identity incil_to_container
.
In both cases the root cause looks to be that netkit scrubs packets (clearing skb->mark
) before executing attached BPF programs. This is evident looking at the source for netkit_xmit
where netkit_prep_forward
clears skb->mark
when crossing network namespaces before netkit_run()
runs any attached programs.
netkit_prep_forward(skb, !net_eq(dev_net(dev), dev_net(peer)));
eth_skb_pkt_type(skb, peer);
skb->dev = peer;
entry = rcu_dereference(nk->active);
if (entry)
ret = netkit_run(entry, skb, ret);
In contrast, when using veth
TC/TCX hooks are executed before the veth
driver does any packet scrubbing.
netkit
egress processing order
sch_handle_egress
is called but does not execute any hooks, since BPF programs are attached directly to the device innetkit
mode.netkit_xmit
beginsnetkit_xmit
clearsskb->mark
netkit_xmit
runscil_to_container
netkit_xmit
passes the packet to the peer device
veth
egress processing order
sch_handle_egress
is called executingcil_to_container
.veth_xmit
beginsveth_xmit
clearsskb->mark
veth_xmit
passes the packet to the peer device
Since cil_to_container
uses ctx->mark
for proxy redirection and policy enforcement this can lead to various issues. For now this only seems to be a problem when using endpoint routes, but if cilium_host
were to be changed to netkit
(I'm not sure if this is the case already) it may interfere there as well.
Just to confirm that this is indeed the root cause, patching the netkit driver with this hack locally resolves the reported issues in both cases making behavior consistent between veth
and netkit
modes (note: this is not a real fix, just something I used to test my observations).
static void netkit_prep_forward(struct sk_buff *skb, bool xnet)
{
+ u32 save_mark = skb->mark;
skb_scrub_packet(skb, xnet);
+ skb->mark = save_mark;
skb->priority = 0;
nf_skip_egress(skb, true);
skb_reset_mac_header(skb);
Tasks
- Short term, decide whether or not we should allow netkit to be used with endpoint routes enabled. We could add some checks on startup to block this combination. Cons: this might make things stop working for some users who are already using this combination without issue.
- Document this as a known issue (docs: Add known issue for netkit endpoint route issues #35126)
- Patch netkit. Talking to @borkmann offline, he mentioned maybe adding a new mode to determine if the scrub happens before or after running BPF. If implemented, this needs some follow up work to make sure this mode is configured.
- Expand test coverage to include some of these scenarios with netkit+endpoint routes.