-
Notifications
You must be signed in to change notification settings - Fork 3.4k
WIP: run WG + KPR #19407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
WIP: run WG + KPR #19407
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The kube-apiserver reserved identity requires us to create a IPCache entry based on the endpoints of the `default/kubernetes` K8s service. This kube-apiserver IPCache entry will overwrite any existing remote host entries. The entry also blocks the creation of remote host entries for the same IP (via `source.AllowsOverwrite`). When the kube-apiserver entry is created, we check if there is an existing remote node entry, and if so, copy over the IPSec keypair information into the newly created kube-apiserver entry. If however the order is reversed (we first create the kube-apiserver entry before the remote node is discovered), then the IPSec keypair is never updated. This in turn means that connectivity to any remote nodes which also run the kube-apiserver is broken when node-to-node encryption is enabled. This commit works around the issue by adding the IPKeyPair before the labels are injected. This allows InjectLabels to look up the pair whenever the entry is created or updated. This work is also needed for the upcoming WireGuard node-to-node encryption. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit sets the EncryptKey index field to a static non-zero value in CiliumEndpoints and CiliumNodes if WireGuard is enabled for managing cilium-agent. This allows us to have fine-grained tracking of encrypted endpoints. A follow-up commit will update the datapath accordingly, to make use of this field and only encrypt connections to endpoints which have a non-zero encryption key index. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
/test-1.23-net-next |
c49786d
to
bc63bd6
Compare
This commit completely changes the WireGuard integration in the datapath to enable the host2host encryption (also, pod2host and host2pod). Previously, we supported only the pod2pod case. This was implemented by marking a to be encrypted packet, and then letting the IP rule installed in the host netns to forward the packet to the WireGuard tunnel device "cilium_wg0" for the encryption, as shown below: ┌─────────────────┐ │ Pod A netns │ │ ┌────────┐ │ │ │ eth0 │ │ └───┴────┬───┴────┘ ┌────┴──────────┐ │ bpf_lxc@veth0 │ (host netns) └────┬──────────┘ │1."from-container" in bpf_lxc sets MARK_MAGIC_ENCRYPT │2. ip rule matches the mark and routes packet to WG netdev │ ┌───────────┐ └──────►│cilium_wg0 │ └────┬──────┘ │ ┌───▼───┐ │ eth0 │ └───────┘ This was working fine for the pod2pod case (albeit one danger that a sysadmin could nuke the rule making the packet to bypass the WG dev). However, with this approach it was not possible to enable the host2host case, as a packet originating from the host netns was never handled by bpf_lxc. Thus, we needed to change the datapath. To encrypt a host2host packet we need to attach bpf_host to the outgoing device connecting cluster nodes which in the picture is "eth0". Then the program "to-netdev" from bpf_host can forward the packet to the WG dev. Once encrypted, the packet will be again hitting the same bpf_host program. To avoid the packet looping forever, we can configure the WG netdev to set the skb mark after the encryption. Then, in the program we can skip the redirection to the WG netdev if the mark is set. The flow below shows the new integration. ┌─────────────────┐ │ Pod A netns │ │ ┌────────┐ │ │ │ eth0 │ │ └───┴────┬───┴────┘ ┌────┴──────────┐ │ bpf_lxc@veth0 │ (host netns) └────┬──────────┘ │ ┌────▼───────────┐ 1. "to-netdev" does redirect ┌───────────┐ │ bpf_host@eth0 │─────────────────────────────────►│cilium_wg0 │ └─┬─────────▲────┘ └──────┬────┘ │ │ │ │ │ 2. encrypt and set MARK_MAGIC_ENCRYPTED │ │ └──────────────────────────────────────────────┘ │ │ 3. output the encrypted packet │ ▼ The same flow is used for the host2host, host2pod and pod2host cases. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit changes the agent code to support the new WireGuard integration described in the previous commit. The most important changes: 1. Configure the WG netdev to add the skb mark. 2. Add NodeIP to allowed-ips when --encrypt-node=true. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
We need to detect a direct routing dev (= one which is used to connect K8s Nodes) in order to attach bpf_host when WG is enabled, as bpf_host is responsible for redirecting packets to the WG netdev for encryption. Signed-off-by: Martynas Pumputis <m@lambda.lt>
There is no longer skb mark conflict with L7 proxy, so we can drop the check. This means that the L7 proxy can work together with the WG integration. Signed-off-by: Martynas Pumputis <m@lambda.lt>
The PR [1] added "-Wimplicit-int-conversion" which broke ____revalidate_data_pull(). The latter is used when attaching bpf_host to a L3 netdev. [1]: #18501 Signed-off-by: Martynas Pumputis <m@lambda.lt>
Before changing the WG integration's behavior, when running in the tunneling mode, a pod2pod@remote-node traffic escaped the bpf_overlay's tunneling, and was encapsulated once by the WG tunnel netdev. To be compatible with this < v1.12 behavior, this commit adds the redirect to the WG tunnel to the __encap_and_redirect_with_nodeid() function which is eventually called in the pod2pod packet path. Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit attaches the bpf_host's "from-netdev" section to the Cilium's WireGuard tunnel netdev ("cilium_wg0"). This is needed to enable the encryption of the KPR traffic. In particular, we encrypt the N/S KPR requests which will be forwarded to a remote node running a selected service endpoint. IMPORTANT: this encrypts KPR traffic only when running in the non-tunneling mode. For the request path no changes are required. The existing datapath configuration already handles it, as shown in the following: 1. The "from-netdev" attached to eth0 is invoked for the NodePort request. 2. A remote service endpoint is selected, the DNAT and SNAT translations are performed. 3. The translated request is redirected to eth0. 4. The "to-netdev" section on eth0 is invoked. It detects that the packet needs to encrypted, so it redirects to the cilium_wg0. For the reply path a minimal changes were required. After the WG netdev has decrypted the reply packet, the packet is returned to the networking stack. Because the networking stack is not aware of the connection, the reply packet is dropped. To avoid that, we attach the "from-netdev" section to the WG netdev, so that the following can be performed: 1. Reverse SNAT and DNAT translations are applied to the reply. 2. The reply packet is redirected to the outgoing interface. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Martynas Pumputis <m@lambda.lt>
bc63bd6
to
6224144
Compare
😿 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
dont-merge/needs-release-note-label
The author needs to describe the release impact of these changes.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.