Skip to content

Conversation

jrfastab
Copy link
Contributor

@jrfastab jrfastab commented Feb 17, 2020

Fix/cleanup a couple issues related to MTU and encryption+tunnel modes.

When running encryption in tunnel mode we do not account for the tunnel overhead in the encryption route table resulting in a route entry that (due to lack of mtu on the route) suggests a valid packet size would be equal to the mtu of the tunnel. But, actually we need enough room to add tunnel header so similar to main route table deduct tunnel overhead from route entries mtu.

The cilium_vxlan device mtu is not set and will always be default 1500. This should be changed to match mtu of lxc* devices.

Finally, to ensure encryption table is being used correctly and also to unify the code between subnet mode and tunnel mode use output-mark in encryption rules in all cases. Without this its possible we could route a packet multiple times through the normal routing table which has route entries with smaller mtu.

Fixes #10092

Fix issue (#10092) which incorrectly configured route MTU with encryption and tunnel enabled.

This change is Reviewable

Currently the tunnel device MTU will not be set. If the underlying device
is running some other mtu size this creates a case where mtu on tunnel
device is not in-sync. For example on system with 9000 mtu,

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:01:02:03:04:05 brd ff:ff:ff:ff:ff:ff
7: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 56:92:81:65:04:b6 brd ff:ff:ff:ff:ff:ff
8: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether f6:9f:7e:fd:be:4a brd ff:ff:ff:ff:ff:ff
9: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether de:d3:72:2a:69:8a brd ff:ff:ff:ff:ff:ff

Or going the other way with network facing interface <1500 my GKE setup
is the following,

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 42:01:0a:8a:00:19 brd ff:ff:ff:ff:ff:ff
13: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 82:80:8c:7a:36:4a brd ff:ff:ff:ff:ff:ff
14: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ca:98:07:56:a1:2f brd ff:ff:ff:ff:ff:ff
15: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 52:a7:67:e4:53:e4 brd ff:ff:ff:ff:ff:ff

Sending 1500 MTU on a 1460 interface doesn't make sense. But the routes
configured in the main table will help,

$ kubectl exec -ti -n cilium cilium-cwcbc -- ip r s
default via 10.138.0.1 dev eth0 proto dhcp src 10.138.0.25 metric 100
10.36.0.0/24 via 10.36.1.3 dev cilium_host src 10.36.1.3 mtu 1333
10.36.1.0/24 via 10.36.1.3 dev cilium_host src 10.36.1.3 mtu 1333
10.36.1.3 dev cilium_host scope link
10.138.0.1 dev eth0 proto dhcp scope link src 10.138.0.25 metric 100
169.254.123.0/24 dev docker0 proto kernel scope link src 169.254.123.1 linkdown

Still it would be best if the tunnel interface uses the same mtu size as
cilium_{host|net}.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
The encryption table is used after encrypting packet. Here the packet is
encrypted but in the tunnel case must still account for encap overhead.
To ensure this set the mtu of routes here subtracted by the tunnel
overhead.

After this patch on a typical deployment with MTU=1500.

$ kubectl exec -ti -n kube-system cilium-9k9bk -- ip r s t 200
local 10.26.0.0/16 dev cilium_vxlan proto 50 scope host
10.250.0.0/16 dev cilium_host mtu 1450

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
@jrfastab jrfastab requested a review from a team February 17, 2020 16:23
pkg/mtu/mtu.go Outdated
@@ -124,6 +124,15 @@ func NewConfiguration(authKeySize int, encryptEnabled bool, encapEnabled bool, m
return conf
}

// GetRouteEncryptionMTU return the MTU to be used on the encryption routing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on exported method Configuration.GetRouteTunnelMTU should be of the form "GetRouteTunnelMTU ..."

@jrfastab jrfastab changed the title Fix mtu ipsec cilium: encryption + tunnel improvements and fixes Feb 17, 2020
@jrfastab jrfastab added the wip label Feb 17, 2020
@maintainer-s-little-helper
Copy link

Release note label not set, please set the appropriate release note.

@jrfastab jrfastab added needs-backport/1.6 kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. labels Feb 17, 2020
We currently set the output mark only for subnet enabled cases. But,
we actually want it in tunnel cases as well. And to simplify design
we can enable it in all cases. This will cause the encryption route
table to be used after encryption, which now after previous patch
will use the correct route MTU, for both tunnel and routing modes.

Otherwise encrypted packets could use the main table again which has
route MTU set to encryption overhead + tunnel overhead which results
in dropping (sending icmp MSS exceeded) packets that are close to
the MTU limits before encryption. The result is a smaller MTU than
expected. With the output mark set we instead use the routing table
which will have the high mtu set.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
@maintainer-s-little-helper
Copy link

Release note label not set, please set the appropriate release note.

@jrfastab
Copy link
Contributor Author

test-me-please

@coveralls
Copy link

coveralls commented Feb 17, 2020

Coverage Status

Coverage decreased (-0.004%) to 44.446% when pulling dbc5123 on fix-mtu-ipsec into 7290ad4 on master.

@jrfastab
Copy link
Contributor Author

test-me-please

@jrfastab jrfastab removed wip labels Feb 17, 2020
@maintainer-s-little-helper
Copy link

Release note label not set, please set the appropriate release note.

@maintainer-s-little-helper
Copy link

Release note label not set, please set the appropriate release note.

@jrfastab jrfastab added release-note/bug This PR fixes an issue in a previous release of Cilium. and removed dont-merge/needs-release-note labels Feb 17, 2020
@aanm aanm merged commit b6f2e75 into master Feb 18, 2020
@aanm aanm deleted the fix-mtu-ipsec branch February 18, 2020 09:20
@lbernail
Copy link
Contributor

Nice!

As discussed on slack with @jrfastab we may have MTU issues on kernels that:

  • don't support output mark (pre 4.14)
  • and use the main route table after encryption (and not the encryption table)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cilium: tunnel (vxlan) with encryption on older kernels drops packets (skbs) in stack
10 participants