Skip to content

v1.6 backports 2020-03-19 #10638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

v1.6 backports 2020-03-19 #10638

wants to merge 6 commits into from

Conversation

jrfastab
Copy link
Contributor

@jrfastab jrfastab commented Mar 19, 2020

Note: Only IPv4 support added from #9649 IPv6 is not included here, we can debate if we want fib IPv6 support backported somewhere else but its strictly speaking not a fix. This was really an error on my side in original PR to mix a fix with a feature.

Note: Scripts broke on this set of backports so did this by hand comparing manual and v1.6

Once this PR is merged, you can update the PR labels via:

$ for pr in 9649 10218 10268 10551; do contrib/backporting/set-labels.py $pr done 1.6; done

[ upstream commit 4d4fed5 ]

We do not currently remove the 'dir in' encryption rules when encryption
is disabled. The original thinking is we could simply leave these xfrm
policy/state around because they would only ever be triggered if the
datapath marked packets for encryption which it wouldn't because encryption
was disabled. Then if encryption was ever (re)enabled the rules would
already be there. But, subtle detail if the cilium_host IP changes
after encryption is disabled then it (re)enabled with a new IP the
'dir in' state/policy in the xfrm table will no longer be valid. And
if the new cilium host IP is in the same CIDR we can end up with
a collision in the table and possible use old out of date rules.

Fix by removing any 'dir in' rules when encryption is disabled.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit 6ce5562 ]

Currently the tunnel device MTU will not be set. If the underlying device
is running some other mtu size this creates a case where mtu on tunnel
device is not in-sync. For example on system with 9000 mtu,

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:01:02:03:04:05 brd ff:ff:ff:ff:ff:ff
7: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 56:92:81:65:04:b6 brd ff:ff:ff:ff:ff:ff
8: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether f6:9f:7e:fd:be:4a brd ff:ff:ff:ff:ff:ff
9: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether de:d3:72:2a:69:8a brd ff:ff:ff:ff:ff:ff

Or going the other way with network facing interface <1500 my GKE setup
is the following,

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1460 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 42:01:0a:8a:00:19 brd ff:ff:ff:ff:ff:ff
13: cilium_net@cilium_host: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 82:80:8c:7a:36:4a brd ff:ff:ff:ff:ff:ff
14: cilium_host@cilium_net: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1460 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ca:98:07:56:a1:2f brd ff:ff:ff:ff:ff:ff
15: cilium_vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 52:a7:67:e4:53:e4 brd ff:ff:ff:ff:ff:ff

Sending 1500 MTU on a 1460 interface doesn't make sense. But the routes
configured in the main table will help,

$ kubectl exec -ti -n cilium cilium-cwcbc -- ip r s
default via 10.138.0.1 dev eth0 proto dhcp src 10.138.0.25 metric 100
10.36.0.0/24 via 10.36.1.3 dev cilium_host src 10.36.1.3 mtu 1333
10.36.1.0/24 via 10.36.1.3 dev cilium_host src 10.36.1.3 mtu 1333
10.36.1.3 dev cilium_host scope link
10.138.0.1 dev eth0 proto dhcp scope link src 10.138.0.25 metric 100
169.254.123.0/24 dev docker0 proto kernel scope link src 169.254.123.1 linkdown

Still it would be best if the tunnel interface uses the same mtu size as
cilium_{host|net}.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit 25a890c ]

The encryption table is used after encrypting packet. Here the packet is
encrypted but in the tunnel case must still account for encap overhead.
To ensure this set the mtu of routes here subtracted by the tunnel
overhead.

After this patch on a typical deployment with MTU=1500.

$ kubectl exec -ti -n kube-system cilium-9k9bk -- ip r s t 200
local 10.26.0.0/16 dev cilium_vxlan proto 50 scope host
10.250.0.0/16 dev cilium_host mtu 1450

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit b6f2e75 ]

We currently set the output mark only for subnet enabled cases. But,
we actually want it in tunnel cases as well. And to simplify design
we can enable it in all cases. This will cause the encryption route
table to be used after encryption, which now after previous patch
will use the correct route MTU, for both tunnel and routing modes.

Otherwise encrypted packets could use the main table again which has
route MTU set to encryption overhead + tunnel overhead which results
in dropping (sending icmp MSS exceeded) packets that are close to
the MTU limits before encryption. The result is a smaller MTU than
expected. With the output mark set we instead use the routing table
which will have the high mtu set.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit 7fd7d42 ]

If an existing encryption (ipsec) is in place and that policy does not
have a mark field in the netlink message set then the policy list API
will have a nil mark pointer.

However, we immediately dereference that pointer to check if its a
Cilium installed policy without checking for nil first. The result
is the following segfault.

[signal SIGSEGV: segmentation violation code=0x1 addr=0x4 pc=0x1c853df]
goroutine 1 [running]:
github.com/cilium/cilium/pkg/datapath/linux/ipsec.isXfrmPolicyCilium(...)
        /go/src/github.com/cilium/cilium/pkg/datapath/linux/ipsec/ipsec_linux.go:367
github.com/cilium/cilium/pkg/datapath/linux/ipsec.DeleteXfrm()
        /go/src/github.com/cilium/cilium/pkg/datapath/linux/ipsec/ipsec_linux.go:393 +0x10f
github.com/cilium/cilium/pkg/datapath/linux.(*linuxNodeHandler).NodeConfigur

Add nil pointer checks here to resolve this.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
[ upstream commit f03b69c ]

In the fix, "cilium: encryption route table need to account for tunnel
headers" we tried to account for tunnel overhead in the encryption
routing table (ip r s t 200). But we only fixed the case where mtu
is default 1500 if the mtu is anything else we calculate incorrectly.

The initial reporter had a MTU 1500B so it resolved their issue but
didn't fix the general issue. After this patch we will account for
the configured MTU as well as handle the direct routing case correctly
by setting MTU to the default route interface MTU.

Fixes: 25a890c ("cilium: encryption route table need to account for tunnel headers")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
@jrfastab jrfastab requested a review from a team as a code owner March 19, 2020 17:17
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.6 kind/backports This PR provides functionality previously merged into master. labels Mar 19, 2020
@jrfastab
Copy link
Contributor Author

test-me-please

@joestringer
Copy link
Member

joestringer commented Mar 19, 2020

never-tell-me-the-odds

EDIT: Provisioning failure

@joestringer
Copy link
Member

test-me-please

@jrfastab
Copy link
Contributor Author

I'll look into it. Agree, looks like a real failure.

This was referenced Mar 25, 2020
@jrfastab
Copy link
Contributor Author

test-me-please

@jrfastab
Copy link
Contributor Author

provisioning error. I'll try again.

@jrfastab
Copy link
Contributor Author

test-me-please

@aanm
Copy link
Member

aanm commented Mar 30, 2020

Failed again @jrfastab

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/k8s-1.10-gopath/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:420
Connectivity test between nodes failed
Expected
    <bool>: false
to be true
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/k8s-1.10-gopath/src/github.com/cilium/cilium/test/k8sT/DatapathConfiguration.go:193

https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/18268/testReport/junit/Suite-k8s-1/10/K8sDatapathConfig_Transparent_encryption_DirectRouting_Check_connectivity_with_transparent_encryption_and_direct_routing/

@stale
Copy link

stale bot commented Apr 29, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Apr 29, 2020
@jrfastab
Copy link
Contributor Author

jrfastab commented May 1, 2020

closing, tracking this in new PR #10638

@jrfastab jrfastab closed this May 1, 2020
@qmonnet
Copy link
Member

qmonnet commented May 12, 2020

closing, tracking this in new PR #10638

This is the current PR. New PR seems to be #11044.

@tklauser tklauser deleted the pr/v1.6-backport-2020-03-19 branch April 21, 2021 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants