Skip to content

Conversation

brb
Copy link
Member

@brb brb commented Apr 12, 2022

No description provided.

gandro added 3 commits April 11, 2022 18:19
The kube-apiserver reserved identity requires us to create a IPCache
entry based on the endpoints of the `default/kubernetes` K8s service.

This kube-apiserver  IPCache entry will overwrite any existing remote
host entries. The entry also blocks the creation of remote host entries
for the same IP (via `source.AllowsOverwrite`).

When the kube-apiserver entry is created, we check if there is an
existing remote node entry, and if so, copy over the IPSec keypair
information into the newly created kube-apiserver entry. If however the
order is reversed (we first create the kube-apiserver entry before the
remote node is discovered), then the IPSec keypair is never updated.
This in turn means that connectivity to any remote nodes which also run
the kube-apiserver is broken when node-to-node encryption is enabled.

This commit works around the issue by adding the IPKeyPair before the
labels are injected. This allows InjectLabels to look up the pair
whenever the entry is created or updated.

This work is also needed for the upcoming WireGuard node-to-node
encryption.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit sets the EncryptKey index field to a static non-zero value
in CiliumEndpoints and CiliumNodes if WireGuard is enabled for managing
cilium-agent. This allows us to have fine-grained tracking of encrypted
endpoints. A follow-up commit will update the datapath accordingly, to
make use of this field and only encrypt connections to endpoints which
have a non-zero encryption key index.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 12, 2022
@brb
Copy link
Member Author

brb commented Apr 12, 2022

/test-1.23-net-next

@brb brb force-pushed the pr/gandro+brb/wg-host-encryption-v3 branch 2 times, most recently from c49786d to bc63bd6 Compare April 13, 2022 09:07
brb added 7 commits April 13, 2022 16:53
This commit completely changes the WireGuard integration in the
datapath to enable the host2host encryption (also, pod2host and
host2pod).

Previously, we supported only the pod2pod case. This was implemented by
marking a to be encrypted packet, and then letting the IP rule installed
in the host netns to forward the packet to the WireGuard tunnel device
"cilium_wg0" for the encryption, as shown below:

┌─────────────────┐
│   Pod A netns   │
│   ┌────────┐    │
│   │  eth0  │    │
└───┴────┬───┴────┘
    ┌────┴──────────┐
    │ bpf_lxc@veth0 │               (host netns)
    └────┬──────────┘
         │1."from-container" in bpf_lxc sets MARK_MAGIC_ENCRYPT
         │2. ip rule matches the mark and routes packet to WG netdev
         │       ┌───────────┐
         └──────►│cilium_wg0 │
                 └────┬──────┘
                      │
                  ┌───▼───┐
                  │ eth0  │
                  └───────┘

This was working fine for the pod2pod case (albeit one danger that a
sysadmin could nuke the rule making the packet to bypass the WG dev).

However, with this approach it was not possible to enable the host2host
case, as a packet originating from the host netns was never handled by
bpf_lxc. Thus, we needed to change the datapath.

To encrypt a host2host packet we need to attach bpf_host to the outgoing
device connecting cluster nodes which in the picture is "eth0". Then the
program "to-netdev" from bpf_host can forward the packet to the WG dev.
Once encrypted, the packet will be again hitting the same bpf_host
program. To avoid the packet looping forever, we can configure the WG
netdev to set the skb mark after the encryption. Then, in the program
we can skip the redirection to the WG netdev if the mark is set.

The flow below shows the new integration.

┌─────────────────┐
│   Pod A netns   │
│   ┌────────┐    │
│   │  eth0  │    │
└───┴────┬───┴────┘
    ┌────┴──────────┐
    │ bpf_lxc@veth0 │    (host netns)
    └────┬──────────┘
         │
    ┌────▼───────────┐  1. "to-netdev" does redirect    ┌───────────┐
    │ bpf_host@eth0  │─────────────────────────────────►│cilium_wg0 │
    └─┬─────────▲────┘                                  └──────┬────┘
      │         │                                              │
      │         │  2. encrypt and set MARK_MAGIC_ENCRYPTED     │
      │         └──────────────────────────────────────────────┘
      │
      │ 3. output the encrypted packet
      │
      ▼

The same flow is used for the host2host, host2pod and pod2host cases.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit changes the agent code to support the new WireGuard
integration described in the previous commit.

The most important changes:

1. Configure the WG netdev to add the skb mark.
2. Add NodeIP to allowed-ips when --encrypt-node=true.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
We need to detect a direct routing dev (= one which is used to connect
K8s Nodes) in order to attach bpf_host when WG is enabled, as bpf_host
is responsible for redirecting packets to the WG netdev for encryption.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
There is no longer skb mark conflict with L7 proxy, so we can drop the
check. This means that the L7 proxy can work together with the WG
integration.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
The PR [1] added "-Wimplicit-int-conversion" which broke
____revalidate_data_pull(). The latter is used when attaching bpf_host
to a L3 netdev.

[1]: #18501

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Before changing the WG integration's behavior, when running in the
tunneling mode, a pod2pod@remote-node traffic escaped the bpf_overlay's
tunneling, and was encapsulated once by the WG tunnel netdev.

To be compatible with this < v1.12 behavior, this commit adds the
redirect to the WG tunnel to the __encap_and_redirect_with_nodeid()
function which is eventually called in the pod2pod packet path.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit attaches the bpf_host's "from-netdev" section to the
Cilium's WireGuard tunnel netdev ("cilium_wg0").

This is needed to enable the encryption of the KPR traffic. In
particular, we encrypt the N/S KPR requests which will be forwarded to
a remote node running a selected service endpoint.

IMPORTANT: this encrypts KPR traffic only when running in the
non-tunneling mode.

For the request path no changes are required. The existing datapath
configuration already handles it, as shown in the following:

1. The "from-netdev" attached to eth0 is invoked for the NodePort
   request.
2. A remote service endpoint is selected, the DNAT and SNAT translations
   are performed.
3. The translated request is redirected to eth0.
4. The "to-netdev" section on eth0 is invoked. It detects that the
   packet needs to encrypted, so it redirects to the cilium_wg0.

For the reply path a minimal changes were required. After the WG netdev
has decrypted the reply packet, the packet is returned to the networking
stack. Because the networking stack is not aware of the connection, the
reply packet is dropped. To avoid that, we attach the "from-netdev"
section to the WG netdev, so that the following can be performed:

1. Reverse SNAT and DNAT translations are applied to the reply.
2. The reply packet is redirected to the outgoing interface.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/gandro+brb/wg-host-encryption-v3 branch from bc63bd6 to 6224144 Compare April 13, 2022 14:54
@brb brb closed this Apr 14, 2022
@pchaigno
Copy link
Member

😿

@brb
Copy link
Member Author

brb commented Apr 14, 2022

@pchaigno The party has moved to #19401.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dont-merge/needs-release-note-label The author needs to describe the release impact of these changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants