-
Notifications
You must be signed in to change notification settings - Fork 3.4k
cilium: encryption + tunnel improvements and fixes #10218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently the tunnel device MTU will not be set. If the underlying device is running some other mtu size this creates a case where mtu on tunnel device is not in-sync. For example on system with 9000 mtu, 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:01:02:03:04:05 brd ff:ff:ff:ff:ff:ff 7: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 56:92:81:65:04:b6 brd ff:ff:ff:ff:ff:ff 8: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether f6:9f:7e:fd:be:4a brd ff:ff:ff:ff:ff:ff 9: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether de:d3:72:2a:69:8a brd ff:ff:ff:ff:ff:ff Or going the other way with network facing interface <1500 my GKE setup is the following, 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether 42:01:0a:8a:00:19 brd ff:ff:ff:ff:ff:ff 13: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 82:80:8c:7a:36:4a brd ff:ff:ff:ff:ff:ff 14: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether ca:98:07:56:a1:2f brd ff:ff:ff:ff:ff:ff 15: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 52:a7:67:e4:53:e4 brd ff:ff:ff:ff:ff:ff Sending 1500 MTU on a 1460 interface doesn't make sense. But the routes configured in the main table will help, $ kubectl exec -ti -n cilium cilium-cwcbc -- ip r s default via 10.138.0.1 dev eth0 proto dhcp src 10.138.0.25 metric 100 10.36.0.0/24 via 10.36.1.3 dev cilium_host src 10.36.1.3 mtu 1333 10.36.1.0/24 via 10.36.1.3 dev cilium_host src 10.36.1.3 mtu 1333 10.36.1.3 dev cilium_host scope link 10.138.0.1 dev eth0 proto dhcp scope link src 10.138.0.25 metric 100 169.254.123.0/24 dev docker0 proto kernel scope link src 169.254.123.1 linkdown Still it would be best if the tunnel interface uses the same mtu size as cilium_{host|net}. Signed-off-by: John Fastabend <john.fastabend@gmail.com>
The encryption table is used after encrypting packet. Here the packet is encrypted but in the tunnel case must still account for encap overhead. To ensure this set the mtu of routes here subtracted by the tunnel overhead. After this patch on a typical deployment with MTU=1500. $ kubectl exec -ti -n kube-system cilium-9k9bk -- ip r s t 200 local 10.26.0.0/16 dev cilium_vxlan proto 50 scope host 10.250.0.0/16 dev cilium_host mtu 1450 Signed-off-by: John Fastabend <john.fastabend@gmail.com>
pkg/mtu/mtu.go
Outdated
@@ -124,6 +124,15 @@ func NewConfiguration(authKeySize int, encryptEnabled bool, encapEnabled bool, m | |||
return conf | |||
} | |||
|
|||
// GetRouteEncryptionMTU return the MTU to be used on the encryption routing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported method Configuration.GetRouteTunnelMTU should be of the form "GetRouteTunnelMTU ..."
Release note label not set, please set the appropriate release note. |
We currently set the output mark only for subnet enabled cases. But, we actually want it in tunnel cases as well. And to simplify design we can enable it in all cases. This will cause the encryption route table to be used after encryption, which now after previous patch will use the correct route MTU, for both tunnel and routing modes. Otherwise encrypted packets could use the main table again which has route MTU set to encryption overhead + tunnel overhead which results in dropping (sending icmp MSS exceeded) packets that are close to the MTU limits before encryption. The result is a smaller MTU than expected. With the output mark set we instead use the routing table which will have the high mtu set. Signed-off-by: John Fastabend <john.fastabend@gmail.com>
c5c863b
to
fdef052
Compare
Release note label not set, please set the appropriate release note. |
test-me-please |
fdef052
to
dbc5123
Compare
test-me-please |
Release note label not set, please set the appropriate release note. |
Release note label not set, please set the appropriate release note. |
Nice! As discussed on slack with @jrfastab we may have MTU issues on kernels that:
|
Fix/cleanup a couple issues related to MTU and encryption+tunnel modes.
When running encryption in tunnel mode we do not account for the tunnel overhead in the encryption route table resulting in a route entry that (due to lack of mtu on the route) suggests a valid packet size would be equal to the mtu of the tunnel. But, actually we need enough room to add tunnel header so similar to main route table deduct tunnel overhead from route entries mtu.
The cilium_vxlan device mtu is not set and will always be default 1500. This should be changed to match mtu of lxc* devices.
Finally, to ensure encryption table is being used correctly and also to unify the code between subnet mode and tunnel mode use output-mark in encryption rules in all cases. Without this its possible we could route a packet multiple times through the normal routing table which has route entries with smaller mtu.
Fixes #10092
This change is