Skip to content

datapath: Per endpoint routes are broken IPv6 connectivity between pods on the same node when netpol is applied #23852

@brb

Description

@brb

I have two pods running on the same node client2 and client, and an ingress netpol which allows client2 => client. When client2 tries to ping client, the ICMPv6 NS is sent for the router IPv6 addr. The reply is handled by https://github.com/cilium/cilium/blob/master/bpf/lib/icmp6.h#L171 which should be called from from-container @ client2's veth. The function eventually does a redirect to the same veth iface. The cilium monitor output is the following:

<- endpoint 1096 flow 0x0 , identity 54318->unknown state unknown ifindex 0 orig-ip 0.0.0.0: fe80::241e:5dff:fe66:6c4d -> ff02::1:ff00:27b4 NeighborSolicitation
CPU 03: MARK 0x0 FROM 1096 DEBUG: Handling ICMPv6 type=135
CPU 03: MARK 0x0 FROM 1096 DEBUG: ICMPv6 neighbour soliciation for address 0:b4270000
-> lxc0fd17701e6b7: fd00:10:244:2::27b4 -> fe80::241e:5dff:fe66:6c4d NeighborAdvertisement
CPU 03: MARK 0x0 FROM 1096 DEBUG: Inheriting identity=2 from stack
<- stack flow 0x0 , identity world->unknown state unknown ifindex lxc0fd17701e6b7 orig-ip 0.0.0.0: fd00:10:244:2::27b4 -> fe80::241e:5dff:fe66:6c4d NeighborAdvertisement
CPU 03: MARK 0x0 FROM 1096 DEBUG: Conntrack lookup 1/2: src=[::0:27b4]:0 dst=[::fe66:6c4d]:0
CPU 03: MARK 0x0 FROM 1096 DEBUG: Conntrack lookup 2/2: nexthdr=58 flags=0
CPU 03: MARK 0x0 FROM 1096 DEBUG: CT entry found lifetime=198692, revnat=0
CPU 03: MARK 0x0 FROM 1096 DEBUG: CT verdict: Established, revnat=0
CPU 03: MARK 0x0 FROM 1096 DEBUG: Successfully mapped addr.p4=[::0:27b4] to identity=2
CPU 03: MARK 0x0 FROM 1096 DEBUG: Attempting local delivery for container id 1096 from seclabel 54318
CPU 03: MARK 0x0 FROM 1096 DEBUG: Policy evaluation would deny packet from 2 to 54318
Policy verdict log: flow 0x0 local EP ID 1096, remote ID world, proto 58, ingress, action deny, match none, fd00:10:244:2::27b4 -> fe80::241e:5dff:fe66:6c4d NeighborAdvertisement
xx drop (Policy denied) flow 0x0 to endpoint 1096, ifindex 35, file 2:1604, , identity world->54318: fd00:10:244:2::27b4 -> fe80::241e:5dff:fe66:6c4d NeighborAdvertisement

When I disable the per-endpoint routes, the test case works:

<- endpoint 3853 flow 0x0 , identity 54318->unknown state unknown ifindex 0 orig-ip 0.0.0.0: fe80::8b1:9bff:fe90:6727 -> ff02::1:ff00:27b4 NeighborSolicitation
CPU 04: MARK 0x0 FROM 3853 DEBUG: Handling ICMPv6 type=135
CPU 04: MARK 0x0 FROM 3853 DEBUG: ICMPv6 neighbour soliciation for address 0:b4270000
-> lxc3c15a07fa2c0: fd00:10:244:2::27b4 -> fe80::8b1:9bff:fe90:6727 NeighborAdvertisement
<- endpoint 3853 flow 0x0 , identity 54318->unknown state unknown ifindex 0 orig-ip 0.0.0.0: fd00:10:244:2::32cf -> ff02::1:ff00:27b4 NeighborSolicitation
CPU 04: MARK 0x0 FROM 3853 DEBUG: Handling ICMPv6 type=135
CPU 04: MARK 0x0 FROM 3853 DEBUG: ICMPv6 neighbour soliciation for address 0:b4270000
-> lxc3c15a07fa2c0: fd00:10:244:2::27b4 -> fd00:10:244:2::32cf NeighborAdvertisement

With pwru I see the following is happening. When the per EP routes is off:

0xffff98b0f1640900      5 [kworker/5:2-mm_percpu_wq]          skb_do_redirect netns=4026533335 mark=0x0 ifindex=43 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fe80::8b1:9bff:fe90:6727]:0(icmp6)
0xffff98b0f1640900      5 [kworker/5:2-mm_percpu_wq]           __bpf_redirect netns=4026533335 mark=0x0 ifindex=43 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fe80::8b1:9bff:fe90:6727]:0(icmp6)
0xffff98b0f1640900      5 [kworker/5:2-mm_percpu_wq]         __dev_queue_xmit netns=4026533335 mark=0x0 ifindex=43 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fe80::8b1:9bff:fe90:6727]:0(icmp6)
0xffff98b0f1640900      5 [kworker/5:2-mm_percpu_wq]      netdev_core_pick_tx netns=4026533335 mark=0x0 ifindex=43 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fe80::8b1:9bff:fe90:6727]:0(icmp6)
0xffff98b0f1640900      5 [kworker/5:2-mm_percpu_wq]        validate_xmit_skb netns=4026533335 mark=0x0 ifindex=43 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fe80::8b1:9bff:fe90:6727]:0(icmp6)

While with the per EP on:

0xffff98b0835ca100      5           [ping]          skb_do_redirect netns=4026533335 mark=0x0 ifindex=51 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fd00:10:244:2::e1d8]:0(icmp6)
0xffff98b0835ca100      5           [ping]           __bpf_redirect netns=4026533335 mark=0x0 ifindex=51 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fd00:10:244:2::e1d8]:0(icmp6)
0xffff98b0835ca100      5           [ping]         __dev_queue_xmit netns=4026533335 mark=0x0 ifindex=51 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fd00:10:244:2::e1d8]:0(icmp6)
0xffff98b0835ca100      5           [ping]             tcf_classify netns=4026533335 mark=0x0 ifindex=51 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fd00:10:244:2::e1d8]:0(icmp6)
0xffff98b0835ca100      5           [ping] kfree_skb_reason(SKB_DROP_REASON_TC_EGRESS) netns=4026533335 mark=0x0 ifindex=51 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fd00:10:244:2::e1d8]:0(icmp6)

Basically with the EP on we enter the to-container , while with EP off we bypass it. And this is because we set RequireEgressProg with the EP which:

// RequireEgressProg returns true if the endpoint requires an egress
// program attached to the InterfaceName() invoking the section
// "to-container"

Anyway, I assume that we allow any traffic from HOST_ID regardless of a netpol. But ROUTER_IPV6 is identified as WORLD_ID :

CPU 03: MARK 0x0 FROM 1096 DEBUG: Successfully mapped addr.p4=[::0:27b4] to identity=2

Which happens due to:

CPU 03: MARK 0x0 FROM 1096 DEBUG: Inheriting identity=2 from stack

Which is set https://github.com/cilium/cilium/blob/pr/brb/ci-dp-v6/bpf/lib/identity.h#L122.

In the case of IPv4, we don't hit the issue, as we don't have a custom ARP responder in the to-container. And we need that custom responder as in the IPv6 case the ROUTER_IPV6 is not set to cilium_host (#23445).

To fix the issue we either need to set a mark in order to set HOST_ID for the reply, or remove the ICMPv6 NS/NA responder once #23445 has been resolved.

Metadata

Metadata

Assignees

Labels

area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.feature/ipv6Relates to IPv6 protocol supportkind/bugThis is a bug in the Cilium logic.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions