v1.10 backports 2021-09-29 #17495

jibi · 2021-09-29T10:21:00Z

~~operator: add identity GC metrics #14254 -- operator: add identity GC metrics (@ArthurChiao)~~
- skipped due to some non trivial conflicts
docs: Fix up broken minikube link #17382 -- docs: Fix up broken minikube link (@joestringer)
docs: Fix version sorting for CRD schema docs #17288 -- docs: Fix version sorting for CRD schema docs (@joestringer)
Remove CiliumNode deletion logic from CiliumNode watcher and guarantee CiliumNode's OwnerReference is always set #17329 -- Remove CiliumNode deletion logic from CiliumNode watcher and guarantee CiliumNode's OwnerReference is always set (@christarazi)
operator: Improve identity GC efficiency #17359 -- operator: Improve identity GC efficiency (@christarazi)
bugtool: Include listing of egress gateway map #17378 -- bugtool: Include listing of egress gateway map (@pchaigno)
docs: Fix command for overwriting iptables on kube-proxy replacement install #16264 -- docs: Fix command for overwriting iptables on kube-proxy replacement install (@Stijn98s)
daemon: Add --derive-masquerade-ip-addr-from-device opt #17230 -- daemon: Add --derive-masquerade-ip-addr-from-device opt (@brb)
- Minor conflict, moved the c.DeriveMasqIPAddrFromDevice assignment inside populateMasqueradingSettings()
contrib/backporting: add environment variables to set ORG and REPO #17424 -- contrib/backporting: add environment variables to set ORG and REPO (@aanm)
hubble: Display proxy redirects in policy verdict events #17411 -- hubble: Display proxy redirects in policy verdict events (@pchaigno)
- Minor conflict: rebuilt the hubble API protobufs with make generate-hubble-api
pkg/k8s: fix User-Agent for kubernetes client #17417 -- pkg/k8s: fix User-Agent for kubernetes client (@aanm)
helm: upgrade envoy to v1.18.4 for hubble-ui #17439 -- helm: upgrade envoy to v1.18.4 for hubble-ui (@geakstr)
- minor conflict: in values.yaml I kept the pullPolicy to IfNotPresent (as opposed to what we have in master, i.e. Always) and running make -C install/kubernetes updated also certgen to v0.1.5
Fix FQDN memory leak #17432 -- Fix FQDN memory leak (@aanm)
pkg/identity: Add missing labels to well-known identities #16585 -- pkg/identity: Add missing labels to well-known identities (@mauriciovasquezbernal)
helm: set correct versions of docker images in Makefile #17477 -- helm: set correct versions of docker images in Makefile (@aanm)
Fix bug where IP addresses of devices in unknown state are resolved as remote-node #17418 -- Fix bug where IP addresses of devices in unknown state are resolved as remote-node (@jibi)

Once this PR is merged, you can update the PR labels via:

$ for pr in 17382 17288 17329 17359 17378 16264 17230 17424 17411 17417 17439 17432 16585 17477 17418; do contrib/backporting/set-labels.py $pr done 1.10; done

[ upstream commit 9e740b1 ] The section that this guide refers to is now its own dedicated page guide, and users can use any environment to test it out. Fix the redirect. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 98a995c ] Use "sort -V" (versions) rather than "sort -n" (numeric) so that the docs list the minor versions in chronological order. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 71a65cb ] We don't need to implement this logic for two reasons: 1) We rely on CiliumNode resources to be deleted / cleaned up by attaching the corresponding K8s Node as an `ownerReference` in the CiliumNode. 2) It is redundant to delete the CiliumNode in response to an event...of the CiliumNode deletion itself. In very rare cases, this logic can actually delete a newly created CiliumNode by accident (see example below). Instead, keep all deletion logic besides the actual K8s API calls (DELETE) and perform a Get() to ensure that it's been deleted. Otherwise, log to the user that the resource may still exist. Example: Say an existing node was deleted and then recreated in quick succession with the same name. When the node is recreated, the agent will be scheduled on it. During bootstrap it'll create a corresponding CiliumNode resource. Given that only one Operator is operational at any time in a cluster, it is already running on another node in the cluster. The node-delete event will first delete the K8s node and then trigger a CN-delete via reason 1 from above. It is possible for the CN-delete event to be delayed such that it is received after the node-create event (the recreate). When the CN-delete event is received by the already-running Operator, the CiliumNode watcher logic will then trigger (erroneously) another CN-delete, thereby deleting the CiliumNode resource while the K8s node is still alive. Fixes: 6d44f4c ("operator: sync cilium nodes to kvstore instead of k8s nodes") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

…mNode [ upstream commit b0c3393 ] It is impossible to set the OwnerReference if we fail to fetch the corresponding Kubernetes Node and the existing CiliumNode resource doesn't already have it set. We can rely the OwnerReference to be set because this logic was added in v1.6, which is sufficiently earlier version of Cilium. [1] The reason for doing this is to ensure that the OwnerReference can always be set. If we cannot, this should be treated as an error and we shouldn't proceed. Cilium should not run in an environment where the Kubernetes Node resource is missing. [1]: 5c365f2 ("ipam: Automatically create CiliumNode resource on startup") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 2b44dcb ] This is useful in warning or error level messages to help nudge the user in the right direction when troubleshooting. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit ede69e8 ] With this commit, the identity GC rate limit (--identity-gc-rate-interval) becomes the effective rate at which identities are garbage collected. Previously, the identity GC interval (--identity-gc-interval) would cause the Operator to GC for that much time, then the sleep for that much time, rinse and repeat, effectively halving the rate. To use concrete numbers for an example, let's say our interval is 5m and our GC rate interval is 1000 per minute. It would mean that previously, we would GC 5000 identities at a maximum for 10m (assuming that deletion takes 0s). How was that calculated? Each minute, we GC 1000 identities. After 5m, we have GC'd 5000 identities. But now we have to sleep for 5m because that's our GC interval. Hence making our effective GC rate 500 per minute (instead of being 1000/m). Now, we compute the time taken to perform the actual GC and subtract that from the interval. So in our above example, we eliminate the dead time of 5m and avoid slashing our effective GC rate in half. This change allows the Operator to keep up with the demand more efficiently. The Operator will warn if the GC duration took longer than the interval and set the sleep duration to 0. Suggested-by: Joe Stringer <joe@cilium.io> Suggested-by: Dan Wendlandt <dan@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 3441acc ] Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 27fd5cc ] Signed-off-by: Stijn Smits <stijn@stijn98s.nl> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit d204d78 ] The new option is used to specify a device which globally scoped IP addr should be used for BPF-based masquerading. This is a workaround for an environment which uses ECMP for outgoing traffic via multiple devices and it has a dedicated device which IP addr should be used for the masquerading. The workaround is relevant until #17158 has been resolved (thus, we hide the flag). Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 83d30de ] Having these environment variables allows the cherry-pick script to be used on other projects that are not Cilium. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit c40ed79 ] Before this commit, Hubble was ignoring proxy redirection information from the policy-verdict events it received from the datapath. For example, a cilium monitor event such as: Policy verdict log: flow 0x0 local EP ID 1531, remote ID 35429, proto 17, egress, action redirect, match L3-L4, 10.240.0.62:37282 -> 10.240.0.63:53 udp would be displayed in hubble observe as: Sep 15 17:23:11.960: cilium-test/client-6488dcf5d4-f9kfl:37282 -> kube-system/coredns-d4866bcb7-zh5jv:53 L3-L4 FORWARDED (UDP) This commit adds a new verdict REDIRECTED to signal such event. Such events now appear as: default/pod-to-external-fqdn-allow-google-cnp-5ff4986c89-n87h2:58314 -> kube-system/coredns-755cd654d4-j4vzh:53 UNKNOWN 5 (UDP) A subsequent patch to the Hubble command line will display value 5 as "REDIRECTED". Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 9e4d84b ] The Kubernetes' client User-Agent was never set and it would always fallback to the default value. This commit fixes this issue and now all Cilium components will correctly present their User-Agent. Fixes: b31ed33 ("Add k8s client qps and burst as cli flags for the operator") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 09f3c81 ] Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 9008255 ] The public function ForceExpiredByNames is not executed from anywhere so this function can be safely removed. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 8983227 ] In the FQDN architecture there's a DNS Cache per endpoint, used to track which domain names each endpoint makes DNS requests, and a global DNS Cache where its main functionality is to help tracking which api.FQDNSelector present in the policy applies to locally running endpoints. The latter, as opposed to the former, didn't have any cleanup mechanism for the map that tracked which entries should be garbage collected, making the global DNS Cache to grow. This commit prevents those entries from being tracked for Garbage Collection in the global DNS Cache. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit b281dd7 ] Kubernetes 1.21 automatically adds a new label to all namespaces when the NamespaceDefaultLabelName feature gate is enabled. (https://kubernetes.io/docs/concepts/overview/_print/#automatic-labelling) This commit adds an additional entry for all well-known identities adding that label. Signed-off-by: Mauricio Vásquez <mauricio@accuknox.com> Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit 5d37a2f ] The Makefile contains all component versions which are then used to generate the helm charts. This commit fixes some of those versions that got out-of-sync with the right versions. Fixes: 206105f ("helm: use 'quay.io/cilium/certgen:v0.1.5'") Fixes: 09f3c81 ("helm: upgrade envoy to v1.18.4 for hubble-ui") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

[ upstream commit c4773d8 ] As image versions are supposed to be set in the Makefile, we should add a step on the GH workflow to verify the correctness of those versions. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

pchaigno

My PRs look good 👍

[ upstream commit 6dbabed ] In initExcludedIPs() we build a list of IPs that Cilium needs to exclude to operate. One check to determine if an IP should be excluded is based on the state of the net device: if the device is not up, then its IPs are excluded. Unfortunately, this check is not enough, as it's possible to have a device reporting an unknown state (because its driver is missing the operstate handling, e.g. a dummy device) while still being operational. This commit changes the logic in initExcludedIPs() to not exclude IPs of devices reporting an unknown state. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

jibi · 2021-09-29T14:16:15Z

test-backport-1.10

joestringer

For docs & GH workflows changes:

aanm

LGTM for my commits. Thanks!

joestringer and others added 12 commits September 29, 2021 11:57

logfields: Add Hint field

38c5fdf

[ upstream commit 2b44dcb ] This is useful in warning or error level messages to help nudge the user in the right direction when troubleshooting. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

bugtool: Include listing of egress gateway map

f6e7f77

[ upstream commit 3441acc ] Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

Fix overwriting iptables for kube-proxy free installation

21f7ee1

[ upstream commit 27fd5cc ] Signed-off-by: Stijn Smits <stijn@stijn98s.nl> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

jibi added kind/backports This PR provides functionality previously merged into master. backport/1.10 labels Sep 29, 2021

jibi requested review from a team as code owners September 29, 2021 10:21

jibi requested review from nathanjsweet, nebril and joestringer September 29, 2021 10:21

maintainer-s-little-helper bot assigned nathanjsweet, nebril and joestringer and unassigned nebril Sep 29, 2021

jibi force-pushed the pr/v1.10-backport-2021-09-29 branch from df8d246 to 386b917 Compare September 29, 2021 10:27

geakstr and others added 4 commits September 29, 2021 12:34

helm: upgrade envoy to v1.18.4 for hubble-ui

cf03102

[ upstream commit 09f3c81 ] Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

pkg/fqdn: clean unused code

dbaa193

[ upstream commit 9008255 ] The public function ForceExpiredByNames is not executed from anywhere so this function can be safely removed. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>

aanm added 2 commits September 29, 2021 12:34

jibi force-pushed the pr/v1.10-backport-2021-09-29 branch from 386b917 to 0560a80 Compare September 29, 2021 10:34

pchaigno approved these changes Sep 29, 2021

View reviewed changes

jibi closed this Sep 29, 2021

jibi reopened this Sep 29, 2021

joestringer approved these changes Sep 29, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned joestringer Sep 29, 2021

aanm approved these changes Sep 29, 2021

View reviewed changes

nebril approved these changes Oct 1, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned nebril Oct 1, 2021

nathanjsweet approved these changes Oct 1, 2021

View reviewed changes

maintainer-s-little-helper bot unassigned nathanjsweet Oct 1, 2021

aanm merged commit 4204f66 into v1.10 Oct 2, 2021

aanm deleted the pr/v1.10-backport-2021-09-29 branch October 2, 2021 12:52

joestringer mentioned this pull request Oct 13, 2021

Prepare for release v1.10.5 #17606

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.10 backports 2021-09-29 #17495

v1.10 backports 2021-09-29 #17495

Uh oh!

jibi commented Sep 29, 2021 •

edited

Loading

Uh oh!

pchaigno left a comment

Uh oh!

jibi commented Sep 29, 2021

Uh oh!

joestringer left a comment

Uh oh!

aanm left a comment

Uh oh!

Uh oh!

v1.10 backports 2021-09-29 #17495

v1.10 backports 2021-09-29 #17495

Uh oh!

Conversation

jibi commented Sep 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pchaigno left a comment

Choose a reason for hiding this comment

Uh oh!

jibi commented Sep 29, 2021

Uh oh!

joestringer left a comment

Choose a reason for hiding this comment

Uh oh!

aanm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jibi commented Sep 29, 2021 •

edited

Loading