-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v1.10 backports 2021-09-29 #17495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.10 backports 2021-09-29 #17495
Conversation
[ upstream commit 9e740b1 ] The section that this guide refers to is now its own dedicated page guide, and users can use any environment to test it out. Fix the redirect. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 98a995c ] Use "sort -V" (versions) rather than "sort -n" (numeric) so that the docs list the minor versions in chronological order. Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 71a65cb ] We don't need to implement this logic for two reasons: 1) We rely on CiliumNode resources to be deleted / cleaned up by attaching the corresponding K8s Node as an `ownerReference` in the CiliumNode. 2) It is redundant to delete the CiliumNode in response to an event...of the CiliumNode deletion itself. In very rare cases, this logic can actually delete a newly created CiliumNode by accident (see example below). Instead, keep all deletion logic besides the actual K8s API calls (DELETE) and perform a Get() to ensure that it's been deleted. Otherwise, log to the user that the resource may still exist. Example: Say an existing node was deleted and then recreated in quick succession with the same name. When the node is recreated, the agent will be scheduled on it. During bootstrap it'll create a corresponding CiliumNode resource. Given that only one Operator is operational at any time in a cluster, it is already running on another node in the cluster. The node-delete event will first delete the K8s node and then trigger a CN-delete via reason 1 from above. It is possible for the CN-delete event to be delayed such that it is received after the node-create event (the recreate). When the CN-delete event is received by the already-running Operator, the CiliumNode watcher logic will then trigger (erroneously) another CN-delete, thereby deleting the CiliumNode resource while the K8s node is still alive. Fixes: 6d44f4c ("operator: sync cilium nodes to kvstore instead of k8s nodes") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
…mNode [ upstream commit b0c3393 ] It is impossible to set the OwnerReference if we fail to fetch the corresponding Kubernetes Node and the existing CiliumNode resource doesn't already have it set. We can rely the OwnerReference to be set because this logic was added in v1.6, which is sufficiently earlier version of Cilium. [1] The reason for doing this is to ensure that the OwnerReference can always be set. If we cannot, this should be treated as an error and we shouldn't proceed. Cilium should not run in an environment where the Kubernetes Node resource is missing. [1]: 5c365f2 ("ipam: Automatically create CiliumNode resource on startup") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 2b44dcb ] This is useful in warning or error level messages to help nudge the user in the right direction when troubleshooting. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit ede69e8 ] With this commit, the identity GC rate limit (--identity-gc-rate-interval) becomes the effective rate at which identities are garbage collected. Previously, the identity GC interval (--identity-gc-interval) would cause the Operator to GC for that much time, then the sleep for that much time, rinse and repeat, effectively halving the rate. To use concrete numbers for an example, let's say our interval is 5m and our GC rate interval is 1000 per minute. It would mean that previously, we would GC 5000 identities at a maximum for 10m (assuming that deletion takes 0s). How was that calculated? Each minute, we GC 1000 identities. After 5m, we have GC'd 5000 identities. But now we have to sleep for 5m because that's our GC interval. Hence making our effective GC rate 500 per minute (instead of being 1000/m). Now, we compute the time taken to perform the actual GC and subtract that from the interval. So in our above example, we eliminate the dead time of 5m and avoid slashing our effective GC rate in half. This change allows the Operator to keep up with the demand more efficiently. The Operator will warn if the GC duration took longer than the interval and set the sleep duration to 0. Suggested-by: Joe Stringer <joe@cilium.io> Suggested-by: Dan Wendlandt <dan@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 3441acc ] Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 27fd5cc ] Signed-off-by: Stijn Smits <stijn@stijn98s.nl> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit d204d78 ] The new option is used to specify a device which globally scoped IP addr should be used for BPF-based masquerading. This is a workaround for an environment which uses ECMP for outgoing traffic via multiple devices and it has a dedicated device which IP addr should be used for the masquerading. The workaround is relevant until #17158 has been resolved (thus, we hide the flag). Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 83d30de ] Having these environment variables allows the cherry-pick script to be used on other projects that are not Cilium. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit c40ed79 ] Before this commit, Hubble was ignoring proxy redirection information from the policy-verdict events it received from the datapath. For example, a cilium monitor event such as: Policy verdict log: flow 0x0 local EP ID 1531, remote ID 35429, proto 17, egress, action redirect, match L3-L4, 10.240.0.62:37282 -> 10.240.0.63:53 udp would be displayed in hubble observe as: Sep 15 17:23:11.960: cilium-test/client-6488dcf5d4-f9kfl:37282 -> kube-system/coredns-d4866bcb7-zh5jv:53 L3-L4 FORWARDED (UDP) This commit adds a new verdict REDIRECTED to signal such event. Such events now appear as: default/pod-to-external-fqdn-allow-google-cnp-5ff4986c89-n87h2:58314 -> kube-system/coredns-755cd654d4-j4vzh:53 UNKNOWN 5 (UDP) A subsequent patch to the Hubble command line will display value 5 as "REDIRECTED". Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 9e4d84b ] The Kubernetes' client User-Agent was never set and it would always fallback to the default value. This commit fixes this issue and now all Cilium components will correctly present their User-Agent. Fixes: b31ed33 ("Add k8s client qps and burst as cli flags for the operator") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
df8d246 to
386b917
Compare
[ upstream commit 09f3c81 ] Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 9008255 ] The public function ForceExpiredByNames is not executed from anywhere so this function can be safely removed. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 8983227 ] In the FQDN architecture there's a DNS Cache per endpoint, used to track which domain names each endpoint makes DNS requests, and a global DNS Cache where its main functionality is to help tracking which api.FQDNSelector present in the policy applies to locally running endpoints. The latter, as opposed to the former, didn't have any cleanup mechanism for the map that tracked which entries should be garbage collected, making the global DNS Cache to grow. This commit prevents those entries from being tracked for Garbage Collection in the global DNS Cache. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit b281dd7 ] Kubernetes 1.21 automatically adds a new label to all namespaces when the NamespaceDefaultLabelName feature gate is enabled. (https://kubernetes.io/docs/concepts/overview/_print/#automatic-labelling) This commit adds an additional entry for all well-known identities adding that label. Signed-off-by: Mauricio Vásquez <mauricio@accuknox.com> Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 5d37a2f ] The Makefile contains all component versions which are then used to generate the helm charts. This commit fixes some of those versions that got out-of-sync with the right versions. Fixes: 206105f ("helm: use 'quay.io/cilium/certgen:v0.1.5'") Fixes: 09f3c81 ("helm: upgrade envoy to v1.18.4 for hubble-ui") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit c4773d8 ] As image versions are supposed to be set in the Makefile, we should add a step on the GH workflow to verify the correctness of those versions. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
386b917 to
0560a80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My PRs look good 👍
[ upstream commit 6dbabed ] In initExcludedIPs() we build a list of IPs that Cilium needs to exclude to operate. One check to determine if an IP should be excluded is based on the state of the net device: if the device is not up, then its IPs are excluded. Unfortunately, this check is not enough, as it's possible to have a device reporting an unknown state (because its driver is missing the operstate handling, e.g. a dummy device) while still being operational. This commit changes the logic in initExcludedIPs() to not exclude IPs of devices reporting an unknown state. Signed-off-by: Gilberto Bertin <gilberto@isovalent.com> Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
|
test-backport-1.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For docs & GH workflows changes:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for my commits. Thanks!
operator: add identity GC metrics #14254 -- operator: add identity GC metrics (@ArthurChiao)c.DeriveMasqIPAddrFromDeviceassignment insidepopulateMasqueradingSettings()make generate-hubble-apivalues.yamlI kept thepullPolicytoIfNotPresent(as opposed to what we have in master, i.e.Always) and runningmake -C install/kubernetesupdated alsocertgentov0.1.5Once this PR is merged, you can update the PR labels via: