Skip to content

Conversation

borkmann
Copy link
Member

@borkmann borkmann commented Jan 31, 2024

(wip, see commit desc)

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 31, 2024
@borkmann borkmann added the release-note/misc This PR makes changes that have no direct user impact. label Jan 31, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 31, 2024
@borkmann borkmann added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jan 31, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 31, 2024
We're going to add dsr_external bit, so this is to better distinguish
the two in the CT state.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The option.Config.NodePortNat46X64 is only supported for LB-only mode,
so do not enable it for regular clusters.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
@borkmann borkmann force-pushed the pr/ipip4 branch 8 times, most recently from 2982463 to 9f5aaa8 Compare February 2, 2024 13:45
Make space in our BPF CT. CONNTRACK_ACCOUNTING was recently disabled
by default. Shrink the stats from rx/tx packets/bytes to just packets/
bytes so that the freed up space can be reused for other meta data.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Support IPIP termination from the Cilium L4LB against a regular Cilium
cluster. This work covers the termination as well as DSR aspect, so that
replies go directly back to clients instead of the Cilium L4LB.

Given the VIP:port of an external L4LB is not known in our K8s cluster,
we also cannot hold them in the revNat map. Therefore, add the tuple
info in the CT map.

Guard this under a compilation flag given this is only relevant for users
who really want to terminate the external L4LB in the workload cluster,
others don't need to take the additional cycles.

From agent side, the --enable-external-dsr={true,false} flag controls this
setting. The default is on false.

Example with IPIP termination :

  Cilium L4LB node:

  # ./cilium-dbg/cilium-dbg service list
  ID   Frontend          Service Type   Backend
  [...]
  11   1.1.1.1:80        ExternalIPs    1 => 192.168.2.12:80 (active)

  Cilium regular cluster with --enable-external-dsr=true:

  # ./cilium-dbg/cilium-dbg service list
  ID   Frontend             Service Type   Backend
  [...]
  11   192.168.2.12:80      ExternalIPs    1 => 193.99.144.80:80 (active)

  tcpdump on Cilium regular node:

  [...]
  09:36:17.421507 IP 192.168.2.11 > 192.168.2.12: IP 192.168.2.13.43196 > 1.1.1.1.80: Flags [S], seq 3976047959, win 42340, options [mss 1460,sackOK,TS val 4083238462 ecr 0,nop,wscale 9], length 0
  09:36:17.421529 IP 192.168.2.12.43196 > 193.99.144.80.80: Flags [S], seq 3976047959, win 42340, options [mss 1460,sackOK,TS val 4083238462 ecr 0,nop,wscale 9], length 0
  09:36:17.428443 IP 193.99.144.80.80 > 192.168.2.12.43196: Flags [S.], seq 1717159938, ack 3976047960, win 14600, options [mss 1460,nop,wscale 0,sackOK,TS val 1591760912 ecr 4083238462], length 0
  09:36:17.428680 IP 1.1.1.1.80 > 192.168.2.13.43196: Flags [S.], seq 1717159938, ack 3976047960, win 14600, options [mss 1460,nop,wscale 0,sackOK,TS val 1591760912 ecr 4083238462], length 0
  [...]

What can be seen is the IPIP termination, the Cilium regular node then
performing the service request to the backend, and upon reply reversing
everything along with the DSR (1.1.1.1.80) to the client directly.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Support IP6IP6 termination from the Cilium L4LB against a regular Cilium
cluster. This is the IPv6 side of 50f6fa8 ("bpf: Support external
IPv4 DSR").

Cilium L4LB node:

  # ./cilium-dbg/cilium-dbg service list
  ID   Frontend          Service Type   Backend
  [...]
  12   [face:b::1]:80    ExternalIPs    1 => [2a02:168:f656:0:1ac0:4dff:fe09:d5e6]:80 (active)

Cilium regular cluster with --enable-external-dsr=true:

  # ./cilium-dbg/cilium-dbg service list
  ID   Frontend                                   Service Type   Backend
  [...]
  12   [2a02:168:f656:0:1ac0:4dff:fe09:d5e6]:80   ExternalIPs    1 => [2a03:2880:f16d:81:face:b00c:0:25de]:80 (active)

tcpdump on Cilium regular node:

  [...]
  12:13:17.150875 IP6 2a02:168:f656::2 > 2a02:168:f656:0:1ac0:4dff:fe09:d5e6: IP6 2a02:168:f656:0:1ac0:4dff:fe0b:720e.36764 > face:b::1.80: Flags [S], seq 863958068, win 43200, options [mss 1440,sackOK,TS val 2302007970 ecr 0,nop,wscale 9], length 0
  12:13:17.150893 IP6 2a02:168:f656:0:1ac0:4dff:fe09:d5e6.36764 > 2a03:2880:f16d:81:face:b00c:0:25de.80: Flags [S], seq 863958068, win 43200, options [mss 1440,sackOK,TS val 2302007970 ecr 0,nop,wscale 9], length 0
  12:13:17.155619 IP6 2a03:2880:f16d:81:face:b00c:0:25de.80 > 2a02:168:f656:0:1ac0:4dff:fe09:d5e6.36764: Flags [S.], seq 1192141025, ack 863958069, win 65535, options [mss 1392,sackOK,TS val 1118681450 ecr 2302007970,nop,wscale 8], length 0
  12:13:17.155911 IP6 face:b::1.80 > 2a02:168:f656:0:1ac0:4dff:fe0b:720e.36764: Flags [S.], seq 1192141025, ack 863958069, win 65535, options [mss 1392,sackOK,TS val 1118681450 ecr 2302007970,nop,wscale 8], length 0
  12:13:17.156232 IP6 2a02:168:f656::2 > 2a02:168:f656:0:1ac0:4dff:fe09:d5e6: IP6 2a02:168:f656:0:1ac0:4dff:fe0b:720e.36764 > face:b::1.80: Flags [.], ack 1, win 85, options [nop,nop,TS val 2302007975 ecr 1118681450], length 0
  [...]

Note that CONNTRACK_ACCOUNTING is not compatible with the --enable-external-dsr
setting given the union in the CT value. There are other items broken as well
such as CONNTRACK_LOCAL. Perhaps it's time to deprecate / remove them entirely
at some point. The agent cannot block enablement of the latter two since it's
only done manually via cilium-dbg tool.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Remove the custom IPIP decap program from the L4LB test case and instead
reuse the newly added Cilium-native implementation.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant