Skip to content

Conversation

brb
Copy link
Member

@brb brb commented Apr 30, 2020

This PR adds a multi-dev support for NodePort BPF and BPF MASQ.

The initial review was done in #10878. Reviewable per commit.

The follow-up tasks #11018.

Fix #9620.

Allow attaching BPF NodePort and BPF masquerade to multiple devices

@maintainer-s-little-helper
Copy link

Please set the appropriate release note label.

@brb
Copy link
Member Author

brb commented Apr 30, 2020

test-me-please

@coveralls
Copy link

coveralls commented Apr 30, 2020

Coverage Status

Coverage decreased (-2.0%) to 42.537% when pulling 7893a4d on pr/brb/multi-dev into 0687a0e on master.

@brb
Copy link
Member Author

brb commented May 1, 2020

test-me-please

@brb brb added area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. pending-review release-note/major This PR introduces major new functionality to Cilium. labels May 1, 2020
@brb brb requested a review from borkmann May 1, 2020 12:39
@brb brb marked this pull request as ready for review May 1, 2020 12:39
@brb brb requested a review from a team May 1, 2020 12:39
@brb brb requested a review from a team as a code owner May 1, 2020 12:39
@brb brb requested a review from a team May 1, 2020 12:39
@brb brb requested review from a team as code owners May 1, 2020 12:39
@brb brb requested a review from a team May 1, 2020 12:39
@aanm aanm added the kind/feature This introduces new functionality. label May 1, 2020
common/utils.go Outdated
@@ -50,6 +50,51 @@ func C2GoArray(str string) []byte {
return ret
}

// GoArray2C transforms a byte slice into its hexadecimal string representation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to move away of this package cc @joestringer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #11293

@@ -1205,7 +1205,7 @@ type DaemonConfig struct {
LibDir string // Cilium library files directory
RunDir string // Cilium runtime directory
NAT46Prefix *net.IPNet // NAT46 IPv6 Prefix
Device string // Receive device
Devices []string // bpf_netdev device
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from 1 device to multiple devices requires a change in the flag name. please bring this up in the next weekly meeting

@aanm aanm added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label May 1, 2020
@brb brb force-pushed the pr/brb/multi-dev branch from ed992d3 to eca9870 Compare May 2, 2020 16:32
@brb brb removed the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label May 2, 2020
@brb
Copy link
Member Author

brb commented May 2, 2020

test-me-please

@brb brb force-pushed the pr/brb/multi-dev branch from eca9870 to b071895 Compare May 3, 2020 07:22
@brb
Copy link
Member Author

brb commented May 3, 2020

test-me-please

1 similar comment
@brb
Copy link
Member Author

brb commented May 3, 2020

test-me-please

@brb
Copy link
Member Author

brb commented May 3, 2020

test-focus K8sDatapath*

@brb brb force-pushed the pr/brb/multi-dev branch from 3a356b1 to 05f1328 Compare May 6, 2020 08:17
@brb
Copy link
Member Author

brb commented May 6, 2020

test-me-please

@brb
Copy link
Member Author

brb commented May 6, 2020

test-focus K8sDatapathConfig*

@brb
Copy link
Member Author

brb commented May 6, 2020

test-me-please

1 similar comment
@brb
Copy link
Member Author

brb commented May 6, 2020

test-me-please

brb added 14 commits May 6, 2020 21:09
The field is no longer used, and we can use it to store ifindex of a
NodePort device. This will be needed when we will run bpf_netdev.o on
multiple devices.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
It's going to be used by multiple packages.

Also, add GoArray2CNoSpaces.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit makes it possible to attach bpf_netdev.o to multiple
network devices. Hence, BPF NodePort, BPF MASQ, etc can be exposed
to multiple devices.

Each device can be specified via cilium-agent's --device which accepts
list of strings (e.g. `--device="eth0,eth1"`). The first device in the
list is used to determine a node IP addrs and pod allocation ranges.

With this change, we are no longer able to set some macros such
IPV{4,6}_NODEPORT, NATIVE_DEV_IFINDEX, NATIVE_DEV_MAC in node_config.h.

The first two we pass via -D to bpf_netdev.c during the compile time.
While the third one is retrieved via the NATIVE_DEV_MAC_BY_IFINDEX() macro
which is generated by cilium-agent and stored in node_config.h.

Finally, NATIVE_DEV_IFINDEX,, when bpf_lxc.c does a tail call for the
NodePort rev-DNAT xlation, is retrieved from a related CT entry.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
In the direct routing mode, when exposing NodePort services on multiple
native devices, we should forward a request to a node running a service
endpoint via a device which is used for direct routing. To make this
happen, we need to know the ifindex of the device (otherwise,
fib_lookup will fail), ipv{4,6} addr of the device for SNAT.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Instead of relying on NATIVE_DEV_IFINDEX from node_config.h which no
longer exists, filter all devices and check whether they have TC
ingress/egress BPF filter which name is bpf_netdev.o.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Check that ifindex of a given device does not overflow uin16, and that
the device name does not contain special symbols used to parse in
bpf/init.sh.

The first restriction will go away once we introduce the surrogate id =>
ifindex BPF map. While the second - when we migrate bpf/init.sh to Go.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
The example of output:

```
$ cilium status | grep KubeProxyReplacement

KubeProxyReplacement:   Strict   (eth0, eth1)   [NodePort (HYBRID, 30000-32767, XDP: NONE), ExternalIPs, HostReachableServices (TCP, UDP)]
```

Signed-off-by: Martynas Pumputis <m@lambda.lt>
In the direct routing mode, in order for DSR to work via non-direct
routing iface, a user should set rp_filter either to 0 or to 2 for the
direct routing iface.

Otherwise, the kernel might drop a forwarded packet due to the strict
source IP addr check.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
The network is reachable via enp0s9 in VMs, and it's going to be used
to test NodePort when bpf_netdev.o is attached to multiple devices.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Also, move echoserver-hostnetns into DaemonSet (previously, in k8s v1.11
the echoserver pod could not be scheduled on the node which didn't run cilium,
regardless that the pod was tolerating all taints).

Moving pod into the DaemonSet fixes the problem.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
The test cases are skipped when running in other CNI integrations
(EKS,GKE) as the secondary device might not be available.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Extend the test function to accept a param which indicates whether
tests from a third (aka node w/o cilium) node should be performed.

As a reminder, the third node is used for NodePort testing to avoid
bpf_sock.c optimizations which happen when a request is sent to
a NodePort svc from a node which runs cilium.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Re-using the same port might lead to curl failing to bind, as the port
might be still not released (exit code 45).

Therefore, allow to specify a custom source port.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/brb/multi-dev branch from 05f1328 to f3f4959 Compare May 6, 2020 19:13
@brb
Copy link
Member Author

brb commented May 6, 2020

test-me-please

@brb
Copy link
Member Author

brb commented May 6, 2020

test-focus K8sDatapathConfig*

In most deployments, the ip-masq-agent config will be a mounted
configmap. K8s does the mounting of such configmap via multiple
non-consecutive symlinks. This might result in the config file still
not reachable after CREATE event for /etc/config/ip-masq-agent has been
received:

    level=debug msg="Received fsnotify event: \"/etc/config/..2020_05_06_09_42_01.360772401\": CREATE" subsys=ipmasq
    level=debug msg="Received fsnotify event: \"/etc/config/..2020_05_06_09_42_01.360772401\": CHMOD" subsys=ipmasq
    level=debug msg="Received fsnotify event: \"/etc/config/ip-masq-agent\": CREATE" subsys=ipmasq
    level=info msg="Config file not found" file-path=/etc/config/ip-masq-agent subsys=ipmasqlevel=debug msg="Received fsnotify event: \"/etc/config/..data_tmp\": CREATE" subsys=ipmasq
    level=debug msg="Received fsnotify event: \"/etc/config/..data_tmp\": RENAME" subsys=ipmasq
    level=debug msg="Received fsnotify event: \"/etc/config/..data\": CREATE" subsys=ipmasq

    $ ls -al /etc/config/
    total 16
    drwxrwxrwx 3 root root 4096 May  6 16:27 .
    drwxr-xr-x 1 root root 4096 May  6 16:27 ..
    drwxr-xr-x 2 root root 4096 May  6 16:27 ..2020_05_06_16_27_43.107743627
    lrwxrwxrwx 1 root root   31 May  6 16:27 ..data -> ..2020_05_06_16_27_43.107743627
    lrwxrwxrwx 1 root root   20 May  6 16:27 ip-masq-agent -> ..data/ip-masq-agent

As a workaround (for now), don't skip handling of config file even if we
received an update for an unrelated file.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb
Copy link
Member Author

brb commented May 6, 2020

test-me-please

@brb
Copy link
Member Author

brb commented May 6, 2020

test-focus K8sDatapathConfig*

@rolinh rolinh merged commit 2ff7193 into master May 7, 2020
@rolinh rolinh deleted the pr/brb/multi-dev branch May 7, 2020 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/feature This introduces new functionality. release-note/major This PR introduces major new functionality to Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support exposing services through multiple devices
6 participants