Skip to content

Netkit does not work with endpoint routes in some cases #35060

@jrife

Description

@jrife

This tracks a group of issues related to netkit that share a similar root cause and discussion around it.

In both cases the root cause looks to be that netkit scrubs packets (clearing skb->mark) before executing attached BPF programs. This is evident looking at the source for netkit_xmit where netkit_prep_forward clears skb->mark when crossing network namespaces before netkit_run() runs any attached programs.

	netkit_prep_forward(skb, !net_eq(dev_net(dev), dev_net(peer)));
	eth_skb_pkt_type(skb, peer);
	skb->dev = peer;
	entry = rcu_dereference(nk->active);
	if (entry)
		ret = netkit_run(entry, skb, ret);

In contrast, when using veth TC/TCX hooks are executed before the veth driver does any packet scrubbing.

netkit egress processing order

  1. sch_handle_egress is called but does not execute any hooks, since BPF programs are attached directly to the device in netkit mode.
  2. netkit_xmit begins
  3. netkit_xmit clears skb->mark
  4. netkit_xmit runs cil_to_container
  5. netkit_xmit passes the packet to the peer device

veth egress processing order

  1. sch_handle_egress is called executing cil_to_container.
  2. veth_xmit begins
  3. veth_xmit clears skb->mark
  4. veth_xmit passes the packet to the peer device

Since cil_to_container uses ctx->mark for proxy redirection and policy enforcement this can lead to various issues. For now this only seems to be a problem when using endpoint routes, but if cilium_host were to be changed to netkit (I'm not sure if this is the case already) it may interfere there as well.

Just to confirm that this is indeed the root cause, patching the netkit driver with this hack locally resolves the reported issues in both cases making behavior consistent between veth and netkit modes (note: this is not a real fix, just something I used to test my observations).

 static void netkit_prep_forward(struct sk_buff *skb, bool xnet)
 {
+       u32 save_mark = skb->mark;
        skb_scrub_packet(skb, xnet);
+       skb->mark = save_mark;
        skb->priority = 0;
        nf_skip_egress(skb, true);
        skb_reset_mac_header(skb);

Tasks

  • Short term, decide whether or not we should allow netkit to be used with endpoint routes enabled. We could add some checks on startup to block this combination. Cons: this might make things stop working for some users who are already using this combination without issue.
  • Document this as a known issue (docs: Add known issue for netkit endpoint route issues #35126)
  • Patch netkit. Talking to @borkmann offline, he mentioned maybe adding a new mode to determine if the scrub happens before or after running BPF. If implemented, this needs some follow up work to make sure this mode is configured.
  • Expand test coverage to include some of these scenarios with netkit+endpoint routes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature/netkitkind/bugThis is a bug in the Cilium logic.kind/enhancementThis would improve or streamline existing functionality.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions