-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v1.8 backports 2020-06-03 #11856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.8 backports 2020-06-03 #11856
Conversation
[ upstream commit 5e5c626 ] Signed-off-by: John Watson <johnw@planetscale.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit f719893 ] The test was failing due resulting values being greater then expected, however it is evident that `minAllocate` is a minimum value and should be treated as such. Fixes: #11560 Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 6819f11 ] A following commit will require to wait on the K8sAPIGroupPodV1Core cache Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 05163ed ] Due to a performance optimization, validateEndpoint() was changed to use the pod store instad of contacting the apiserver directly. This required to make the validation async to wait until the pod cache is available. This change however caused the endpoint to be exposed and regenerated before the endpoint was fully validated. This in turn caused the validation to occur while the regeneration was being performed. When the validation then failed, the endpoint was deleted while the endpoint was being regenerated, causing the endpoint to fail with a "JoinEP" message in the logs. This commit requires the endpoint to be validated before exposure and initial regeneration. Fixes: #11645 Fixes: #11076 Fixes: #10856 Fixes: #10667 Fixes: #10278 Fixes: 9975bba ("pkg/endpoint: fetch pod and namespace labels from local stores") Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit b5ed91d ] The code erroneously used `log` which is a global variable, that's unprotected by a mutex. The correct usage is to use `scopedLog` which is passed into ResyncInterfacesAndIPs(), which already has been locked by the caller of ResyncInterfacesAndIPs(). The caller is n.recalculate(). Fixes: #11785 Fixes: 3dfd638 ("ipam: Move iterator logic into generic InstanceMap") Fixes: 9779b09 ("ipam: Support for Azure IPAM") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 5dc904a ] ... which uses jiffies if possible and otherwise falls back to ktime. See also commit cd7921f ("bpf: implement time source switch and use it in CT") for more information wrt ktime vs jiffies. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit e838397 ] For more info see commit cd7921f ("bpf: implement time source switch and use it in CT") for more information wrt ktime vs jiffies. Given the CT maps have been switched, we also should do it for NAT to allow for potential created time correlation of entries. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit a7c6c2b ] If there is no entry in the affinity map, then it's pointless to fetch bpf_mono_now() and prep the match key. Only go for it when really needed. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit a90e313 ] The only thing that lb4_update_affinity_by_netns() is doing for existing entries is updating the last_used timestamp, nothing else. So lets not hit the BPF map update spinlock from all CPUs. Instead use a READ_ONCE()/ WRITE_ONCE() scheme and update the timestamp at the time we do the lookup where we have the val already. Only call the lb4_update_affinity_by_netns() update when the backend_from_affinity is false which indicates that a backend was just newly selected. The lb4_update_affinity_by_netns() is addressed separately to ease review. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 0532f88 ] Using the kernel's fib lookup is too expensive and not strictly needed for the case of having a hairpinned load-balancer as in XDP. Given we already have a faster NODEPORT_NEIGH{4,6} cache that we populate for inbound connections, just look up the backend mac from there. Avoiding the fib lookup in XDP bumps performance by ~1.5Mpps in my environment. We can only make this assumption for the hairpin case where the LB is exposed to a single public dev. We can extend nodeport_lb_hairpin() later on also for the tc case. The assumption on why this is okay is because backend pods have src IP validation in bpf_lx via checking on is_valid_lxc_src_ip{,v4}() which prevents from spoofing (src IP/MAC) in general. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 387364d ] Do the same conversion as done in the DSR case also for SNAT NodePort which similarly helps to improve the performance by avoiding the fib lookup in the kernel. Improving by similar numbers as with DSR case. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 531ff9d ] Given we reuse neigh BPF tables for client and backends, it doesn't make sense to push the HW addresss down only once upon k8s node events like nodeUpdate() and nodeDelete() where we ARP probe and update the fib. Since this is an LRU cache they can get evicted. Therefore update once we did a successful fib lookup so we don't need to perform another one next time and have the guarantee that these continue to be cached once evicted. From control plane, we need to retire old entries however, so deletes are pushed from there. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 24d566e ] Enforce retiring of neigh cache entries either on update or deletion of new nodes. This will enforce the datapath to re-cache the neigh entry from the fib. This is needed in order to not leave stale entries around. Only do this when we have acceleration enabled right now since this is equivalent to what the datapath does as well. We might later extend this for tc as well. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 892316d ] So far eth_store_{s,d}addr() had to fall back to byte-by-byte copies of src/dst mac addresses from stack into XDP buffer since the case of fib_lookup is special in that fib_params.{s,d}mac would cause unaligned access on the BPF stack access when used with our optimized memcpy. For the neigh table cases we do not have this issue and can therefore reduce the number of insns needed by using word and half-word wide copies. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit f61bece ] Small enough that it shouldn't matter much, but add a note here wrt backend selection. Slighly clean up some of the other comments. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit e43d47e ] We only ever need to track neigh entries of clients in case of SNAT when the reply comes back to us or in the case where its local to the node. For DSR and Hyrbid (DSR for TCP) we don't need to track them here since the node only ever sees to original requests which improves scalability for new client mappings as tracking can be avoided altogether. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit b7be1c0 ] The BPF neigh{4,6} maps are decoupled from SNAT NodePort mode these days and are used for i) request IP/Mac mappings for remote backends, ii) for local backends and iii) backend to Mac mappings. Make its size configurable in order to allow for more flexibility. Default to the same as NAT map size. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit a24ba16 ] Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit e8a7de6 ] This commit is to fix the recent changes after rebased. Related PR is #11685. Basically, there was a check if Node is nil after calling mutex lock, which will cause nil pointer dereference. I just refactor the code to make sure underlying node is not nil before obtaining mutex lock. Relates to 5259ff7 Signed-off-by: Tam Mach <sayboras@yahoo.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 3a9a353 ] Before this commit, host policy enforcement was reported as enabled as soon as policies were loaded for the host, even if the host firewall was disabled: ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS ENFORCEMENT ENFORCEMENT 318 Enabled Enabled 1 reserved:host ready 3423 Disabled Disabled 4 reserved:health f00d::a0f:0:0:7ba4 10.16.0.148 ready With this commit, enforcement will remain as disabled as long as the host firewall is disabled: ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS ENFORCEMENT ENFORCEMENT 318 Disabled Disabled 1 reserved:host ready 3423 Disabled Disabled 4 reserved:health f00d::a0f:0:0:7ba4 10.16.0.148 ready Fixes: f9c205d ("pkg/policy: Host network policies") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 1256a13 ] Attach bpf_host to the second host device (i.e., cilium_net) from Golang, as part of the host endpoint's datapath loading, instead of doing it from init.sh. Doing so has the following benefits: - We compile one less time because we only need to patch the object file on the Go side. - It simplifies the weird dance around includes in bpf_host.c. - It moves more code out of init.sh. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit a4d86b2 ] Test is flaky, let's disable it Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 630df50 ] Due to jenkins bug: https://issues.jenkins-ci.org/browse/JENKINS-51454 pipeline was failing after first ci boot timeout. This change changes timeout mechanism to `timeout` cli call instead Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 7d26df1 ] This commit is an attempt to add retry logic to Helm operations in the Kubernetes test suite. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 1729be2 ] Open() writes to m.fd and thus must hold m.lock while doing so. Require m.lock to be held for writing. This also means that we can remove m.openLock. This simplifies the overall fix as the question of order when acquiring m.lock and m.openLock does not come up. Fixes: #10990 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit e57ae1c ] For the case where neither the kernel config can be determined nor derived through probing, we currently emit a warning to the agent log. To avoid confusion downgrade to just an info message. It's harmless since this just means we're using ktime in our datapath instead of jiffies. Given it's useful for debugging, dump the err to the info message as well. Reported-by: Rene Zbinden <rene.zbinden@postfinance.ch> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 601852b ] On the first initialization, the iptables rules may not exist. Remove the silently and warn on any attempt to remove rules once they are known to exist. Also, don't remove iptables rules on additional calls to Reinitialize() unless iptables is enabled. Fixes: #11809 Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 6fe8d09 ] - ensure `global.identityAllocationMode=kvstore` is set; the default is `crd` which is not what etcd test was meant to be testing - it looks like there is still a deadlock occuring when default `crd` modes is used Fixes: #11690 Signed-off-by: Ilya Dmitrichenko <errordeveloper@gmail.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 3245687 ] Given that the connectivity testing is referenced in the troubleshooting docs, it's worth deploying them to an empty namespace, `cilium-test`. h/t @joe Also included an initial effort to inlcude a table describing the coverage that each testing type provides. Fixes: #11581 Signed-off-by: Jed Salazar <jed@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit f00dc2a ] We have been running into cpu frequency issues in our ci, this change will allow us to check whether the build was affected by the problem. Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit d4a968e ] This way you can specify it on the commandline to generate docs in the way that RTD would by passing the environment variable, eg: $ READTHEDOCS_VERSION="v1.8.0-rc2" make render-docs Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 6bc5c74 ] As mentioned in #10991, successDuration was not safely access from eventqueue Add RWLock to make this struct thread-safe Using pointer for spanstat to avoid copylocks issue highlighted by govet Closes #10991 Signed-off-by: Tam Mach <sayboras@yahoo.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 0b49599 ] Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit c4553a8 ] The forward chain rules have been depended on the local delivery interface which depending on the setting of enable-endpoint-routes is either `cilium_host` or `lxc+`. This is sufficient for all regular traffic. For proxy redirection traffic, all traffic still passes through cilium_host regardless of the value of enable-endpoint-routes. Eample of existing rules: ``` -A CILIUM_FORWARD -o lxc+ -m comment --comment "cilium: any->cluster on lxc+ forward accept" -j ACCEPT -A CILIUM_FORWARD -i lxc+ -m comment --comment "cilium: cluster->any on lxc+ forward accept (nodeport)" -j ACCEPT -A CILIUM_FORWARD -i lxc+ -m comment --comment "cilium: cluster->any on lxc+ forward accept" -j ACCEPT ``` This problem was masked because Kubernetes would install these wide-reaching rules: ``` -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A KUBE-FORWARD -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT ``` However, more recent versions of Kubernetes would install more fine-grained rules when the PodCIDR is known too the host: ``` -A KUBE-FORWARD -s 10.10.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT ``` This will still mask the problem if Cilium uses the PodCIDR for IP allocation. However, in case Cilium does not use the announced PodCIDR then these rules would no longer allow the proxy redirection traffic causes proxy redirection to break. Fixes: #11235 Fixes: #11807 Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
…n GKE [ upstream commit 7e0469d ] Signed-off-by: Thomas Graf <thomas@cilium.io> Signed-off-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit fbd7653 ] GetK8sMetadata was supposed to be used for internal usage of the package, making this function public requires protection against parallel access by its internal mutex. Fixes: 4699840 ("ipcache: Export GetK8sMetadata") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
[ upstream commit 40b6f58 ] The node name in the pkg/node is set at `init()` time so there is not point on setting this package variable all over the code. Fixes: 3533fd3 ("pkg/node: give priority to nodeName from K8S_NODE_NAME env variable") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
never-tell-me-the-odds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for my changes. Thanks @tklauser!
test-backport-1.8 |
My commit looks good! |
test-backport-1.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were already backported before (see commits 4fa26a4 and 1b95d35 which are in branch |
Not sure why Otherwise this should be good to merge. |
#11858 wasn't marked for backport, this is what caused |
Once this PR is merged, you can update the PR labels via: