-
Notifications
You must be signed in to change notification settings - Fork 3.4k
NodePort multi-dev support and BPF masq* tests #11267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please set the appropriate release note label. |
test-me-please |
test-me-please |
common/utils.go
Outdated
@@ -50,6 +50,51 @@ func C2GoArray(str string) []byte { | |||
return ret | |||
} | |||
|
|||
// GoArray2C transforms a byte slice into its hexadecimal string representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to move away of this package cc @joestringer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #11293
@@ -1205,7 +1205,7 @@ type DaemonConfig struct { | |||
LibDir string // Cilium library files directory | |||
RunDir string // Cilium runtime directory | |||
NAT46Prefix *net.IPNet // NAT46 IPv6 Prefix | |||
Device string // Receive device | |||
Devices []string // bpf_netdev device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing from 1 device to multiple devices requires a change in the flag name. please bring this up in the next weekly meeting
test-me-please |
test-me-please |
1 similar comment
test-me-please |
test-focus K8sDatapath* |
test-me-please |
test-focus K8sDatapathConfig* |
test-me-please |
1 similar comment
test-me-please |
The field is no longer used, and we can use it to store ifindex of a NodePort device. This will be needed when we will run bpf_netdev.o on multiple devices. Signed-off-by: Martynas Pumputis <m@lambda.lt>
It's going to be used by multiple packages. Also, add GoArray2CNoSpaces. Signed-off-by: Martynas Pumputis <m@lambda.lt>
This commit makes it possible to attach bpf_netdev.o to multiple network devices. Hence, BPF NodePort, BPF MASQ, etc can be exposed to multiple devices. Each device can be specified via cilium-agent's --device which accepts list of strings (e.g. `--device="eth0,eth1"`). The first device in the list is used to determine a node IP addrs and pod allocation ranges. With this change, we are no longer able to set some macros such IPV{4,6}_NODEPORT, NATIVE_DEV_IFINDEX, NATIVE_DEV_MAC in node_config.h. The first two we pass via -D to bpf_netdev.c during the compile time. While the third one is retrieved via the NATIVE_DEV_MAC_BY_IFINDEX() macro which is generated by cilium-agent and stored in node_config.h. Finally, NATIVE_DEV_IFINDEX,, when bpf_lxc.c does a tail call for the NodePort rev-DNAT xlation, is retrieved from a related CT entry. Signed-off-by: Martynas Pumputis <m@lambda.lt>
In the direct routing mode, when exposing NodePort services on multiple native devices, we should forward a request to a node running a service endpoint via a device which is used for direct routing. To make this happen, we need to know the ifindex of the device (otherwise, fib_lookup will fail), ipv{4,6} addr of the device for SNAT. Signed-off-by: Martynas Pumputis <m@lambda.lt>
Instead of relying on NATIVE_DEV_IFINDEX from node_config.h which no longer exists, filter all devices and check whether they have TC ingress/egress BPF filter which name is bpf_netdev.o. Signed-off-by: Martynas Pumputis <m@lambda.lt>
Check that ifindex of a given device does not overflow uin16, and that the device name does not contain special symbols used to parse in bpf/init.sh. The first restriction will go away once we introduce the surrogate id => ifindex BPF map. While the second - when we migrate bpf/init.sh to Go. Signed-off-by: Martynas Pumputis <m@lambda.lt>
The example of output: ``` $ cilium status | grep KubeProxyReplacement KubeProxyReplacement: Strict (eth0, eth1) [NodePort (HYBRID, 30000-32767, XDP: NONE), ExternalIPs, HostReachableServices (TCP, UDP)] ``` Signed-off-by: Martynas Pumputis <m@lambda.lt>
In the direct routing mode, in order for DSR to work via non-direct routing iface, a user should set rp_filter either to 0 or to 2 for the direct routing iface. Otherwise, the kernel might drop a forwarded packet due to the strict source IP addr check. Signed-off-by: Martynas Pumputis <m@lambda.lt>
The network is reachable via enp0s9 in VMs, and it's going to be used to test NodePort when bpf_netdev.o is attached to multiple devices. Signed-off-by: Martynas Pumputis <m@lambda.lt>
Also, move echoserver-hostnetns into DaemonSet (previously, in k8s v1.11 the echoserver pod could not be scheduled on the node which didn't run cilium, regardless that the pod was tolerating all taints). Moving pod into the DaemonSet fixes the problem. Signed-off-by: Martynas Pumputis <m@lambda.lt>
The test cases are skipped when running in other CNI integrations (EKS,GKE) as the secondary device might not be available. Signed-off-by: Martynas Pumputis <m@lambda.lt>
Extend the test function to accept a param which indicates whether tests from a third (aka node w/o cilium) node should be performed. As a reminder, the third node is used for NodePort testing to avoid bpf_sock.c optimizations which happen when a request is sent to a NodePort svc from a node which runs cilium. Signed-off-by: Martynas Pumputis <m@lambda.lt>
Re-using the same port might lead to curl failing to bind, as the port might be still not released (exit code 45). Therefore, allow to specify a custom source port. Signed-off-by: Martynas Pumputis <m@lambda.lt>
Signed-off-by: Martynas Pumputis <m@lambda.lt>
test-me-please |
test-focus K8sDatapathConfig* |
In most deployments, the ip-masq-agent config will be a mounted configmap. K8s does the mounting of such configmap via multiple non-consecutive symlinks. This might result in the config file still not reachable after CREATE event for /etc/config/ip-masq-agent has been received: level=debug msg="Received fsnotify event: \"/etc/config/..2020_05_06_09_42_01.360772401\": CREATE" subsys=ipmasq level=debug msg="Received fsnotify event: \"/etc/config/..2020_05_06_09_42_01.360772401\": CHMOD" subsys=ipmasq level=debug msg="Received fsnotify event: \"/etc/config/ip-masq-agent\": CREATE" subsys=ipmasq level=info msg="Config file not found" file-path=/etc/config/ip-masq-agent subsys=ipmasqlevel=debug msg="Received fsnotify event: \"/etc/config/..data_tmp\": CREATE" subsys=ipmasq level=debug msg="Received fsnotify event: \"/etc/config/..data_tmp\": RENAME" subsys=ipmasq level=debug msg="Received fsnotify event: \"/etc/config/..data\": CREATE" subsys=ipmasq $ ls -al /etc/config/ total 16 drwxrwxrwx 3 root root 4096 May 6 16:27 . drwxr-xr-x 1 root root 4096 May 6 16:27 .. drwxr-xr-x 2 root root 4096 May 6 16:27 ..2020_05_06_16_27_43.107743627 lrwxrwxrwx 1 root root 31 May 6 16:27 ..data -> ..2020_05_06_16_27_43.107743627 lrwxrwxrwx 1 root root 20 May 6 16:27 ip-masq-agent -> ..data/ip-masq-agent As a workaround (for now), don't skip handling of config file even if we received an update for an unrelated file. Signed-off-by: Martynas Pumputis <m@lambda.lt>
test-me-please |
test-focus K8sDatapathConfig* |
This PR adds a multi-dev support for NodePort BPF and BPF MASQ.
The initial review was done in #10878. Reviewable per commit.
The follow-up tasks #11018.
Fix #9620.