-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
I have two pods running on the same node client2
and client
, and an ingress netpol which allows client2 => client
. When client2
tries to ping client, the ICMPv6 NS is sent for the router IPv6 addr. The reply is handled by https://github.com/cilium/cilium/blob/master/bpf/lib/icmp6.h#L171 which should be called from from-container
@ client2
's veth. The function eventually does a redirect to the same veth iface. The cilium monitor output is the following:
<- endpoint 1096 flow 0x0 , identity 54318->unknown state unknown ifindex 0 orig-ip 0.0.0.0: fe80::241e:5dff:fe66:6c4d -> ff02::1:ff00:27b4 NeighborSolicitation
CPU 03: MARK 0x0 FROM 1096 DEBUG: Handling ICMPv6 type=135
CPU 03: MARK 0x0 FROM 1096 DEBUG: ICMPv6 neighbour soliciation for address 0:b4270000
-> lxc0fd17701e6b7: fd00:10:244:2::27b4 -> fe80::241e:5dff:fe66:6c4d NeighborAdvertisement
CPU 03: MARK 0x0 FROM 1096 DEBUG: Inheriting identity=2 from stack
<- stack flow 0x0 , identity world->unknown state unknown ifindex lxc0fd17701e6b7 orig-ip 0.0.0.0: fd00:10:244:2::27b4 -> fe80::241e:5dff:fe66:6c4d NeighborAdvertisement
CPU 03: MARK 0x0 FROM 1096 DEBUG: Conntrack lookup 1/2: src=[::0:27b4]:0 dst=[::fe66:6c4d]:0
CPU 03: MARK 0x0 FROM 1096 DEBUG: Conntrack lookup 2/2: nexthdr=58 flags=0
CPU 03: MARK 0x0 FROM 1096 DEBUG: CT entry found lifetime=198692, revnat=0
CPU 03: MARK 0x0 FROM 1096 DEBUG: CT verdict: Established, revnat=0
CPU 03: MARK 0x0 FROM 1096 DEBUG: Successfully mapped addr.p4=[::0:27b4] to identity=2
CPU 03: MARK 0x0 FROM 1096 DEBUG: Attempting local delivery for container id 1096 from seclabel 54318
CPU 03: MARK 0x0 FROM 1096 DEBUG: Policy evaluation would deny packet from 2 to 54318
Policy verdict log: flow 0x0 local EP ID 1096, remote ID world, proto 58, ingress, action deny, match none, fd00:10:244:2::27b4 -> fe80::241e:5dff:fe66:6c4d NeighborAdvertisement
xx drop (Policy denied) flow 0x0 to endpoint 1096, ifindex 35, file 2:1604, , identity world->54318: fd00:10:244:2::27b4 -> fe80::241e:5dff:fe66:6c4d NeighborAdvertisement
When I disable the per-endpoint routes, the test case works:
<- endpoint 3853 flow 0x0 , identity 54318->unknown state unknown ifindex 0 orig-ip 0.0.0.0: fe80::8b1:9bff:fe90:6727 -> ff02::1:ff00:27b4 NeighborSolicitation
CPU 04: MARK 0x0 FROM 3853 DEBUG: Handling ICMPv6 type=135
CPU 04: MARK 0x0 FROM 3853 DEBUG: ICMPv6 neighbour soliciation for address 0:b4270000
-> lxc3c15a07fa2c0: fd00:10:244:2::27b4 -> fe80::8b1:9bff:fe90:6727 NeighborAdvertisement
<- endpoint 3853 flow 0x0 , identity 54318->unknown state unknown ifindex 0 orig-ip 0.0.0.0: fd00:10:244:2::32cf -> ff02::1:ff00:27b4 NeighborSolicitation
CPU 04: MARK 0x0 FROM 3853 DEBUG: Handling ICMPv6 type=135
CPU 04: MARK 0x0 FROM 3853 DEBUG: ICMPv6 neighbour soliciation for address 0:b4270000
-> lxc3c15a07fa2c0: fd00:10:244:2::27b4 -> fd00:10:244:2::32cf NeighborAdvertisement
With pwru
I see the following is happening. When the per EP routes is off:
0xffff98b0f1640900 5 [kworker/5:2-mm_percpu_wq] skb_do_redirect netns=4026533335 mark=0x0 ifindex=43 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fe80::8b1:9bff:fe90:6727]:0(icmp6)
0xffff98b0f1640900 5 [kworker/5:2-mm_percpu_wq] __bpf_redirect netns=4026533335 mark=0x0 ifindex=43 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fe80::8b1:9bff:fe90:6727]:0(icmp6)
0xffff98b0f1640900 5 [kworker/5:2-mm_percpu_wq] __dev_queue_xmit netns=4026533335 mark=0x0 ifindex=43 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fe80::8b1:9bff:fe90:6727]:0(icmp6)
0xffff98b0f1640900 5 [kworker/5:2-mm_percpu_wq] netdev_core_pick_tx netns=4026533335 mark=0x0 ifindex=43 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fe80::8b1:9bff:fe90:6727]:0(icmp6)
0xffff98b0f1640900 5 [kworker/5:2-mm_percpu_wq] validate_xmit_skb netns=4026533335 mark=0x0 ifindex=43 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fe80::8b1:9bff:fe90:6727]:0(icmp6)
While with the per EP on:
0xffff98b0835ca100 5 [ping] skb_do_redirect netns=4026533335 mark=0x0 ifindex=51 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fd00:10:244:2::e1d8]:0(icmp6)
0xffff98b0835ca100 5 [ping] __bpf_redirect netns=4026533335 mark=0x0 ifindex=51 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fd00:10:244:2::e1d8]:0(icmp6)
0xffff98b0835ca100 5 [ping] __dev_queue_xmit netns=4026533335 mark=0x0 ifindex=51 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fd00:10:244:2::e1d8]:0(icmp6)
0xffff98b0835ca100 5 [ping] tcf_classify netns=4026533335 mark=0x0 ifindex=51 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fd00:10:244:2::e1d8]:0(icmp6)
0xffff98b0835ca100 5 [ping] kfree_skb_reason(SKB_DROP_REASON_TC_EGRESS) netns=4026533335 mark=0x0 ifindex=51 proto=dd86 mtu=1500 len=86 [fd00:10:244:2::27b4]:0->[fd00:10:244:2::e1d8]:0(icmp6)
Basically with the EP on we enter the to-container
, while with EP off we bypass it. And this is because we set RequireEgressProg
with the EP which:
// RequireEgressProg returns true if the endpoint requires an egress
// program attached to the InterfaceName() invoking the section
// "to-container"
Anyway, I assume that we allow any traffic from HOST_ID
regardless of a netpol. But ROUTER_IPV6
is identified as WORLD_ID :
CPU 03: MARK 0x0 FROM 1096 DEBUG: Successfully mapped addr.p4=[::0:27b4] to identity=2
Which happens due to:
CPU 03: MARK 0x0 FROM 1096 DEBUG: Inheriting identity=2 from stack
Which is set https://github.com/cilium/cilium/blob/pr/brb/ci-dp-v6/bpf/lib/identity.h#L122.
In the case of IPv4, we don't hit the issue, as we don't have a custom ARP responder in the to-container
. And we need that custom responder as in the IPv6 case the ROUTER_IPV6
is not set to cilium_host
(#23445).
To fix the issue we either need to set a mark in order to set HOST_ID
for the reply, or remove the ICMPv6 NS/NA responder once #23445 has been resolved.