-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v1.11 backports 2022-05-17 #19858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.11 backports 2022-05-17 #19858
Conversation
/test-backport-1.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks and looks good for my changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my commits. Thanks Chris.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My change looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes LGTM, thanks.
e42aa04
to
48ed237
Compare
[ upstream commit 6b57b52 ] This commit fixes the error wrapping inside the dnsproxy package. When a DNS response is being processed by the DNS proxy inside NotifyOnDNSMsg(), we check ProxyRequestContext. If the request response timed out, we annotate the metrics specifically to indicate that it timed out. This relies on the errors being properly wrapped. In order to do this, errors.As() is used and all errors are properly wrapped with '%w'. Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 81b1a62 ] Signed-off-by: Darren Foo <stonith@users.noreply.github.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 35f2ff3 ] Before this patch, the hubble-peer Service would be deployed during preflight check, which will in turn prevent Cilium to be installed as it would attempt to install it again. Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit f481981 ] This commit does the following: * Adds a configuration option for controlling the concurrency of the DNS proxy * Adds a configuration option for the semaphore (from above) timeout * Exposes an additional metric for the time taken to perform the policy check on a DNS request within the DNS proxy The concurrency limitation is done by introducing a semaphore to DNS proxy. By default, no such limit is imposed. Users are advised to take into account the number of DNS requests[1] and how many CPUs on each node in their cluster in order to come up with an appropriate concurrency limit. In addition, we expose the semaphore grace period as a configurable option. Assuming the "right" for this timeout is a tradeoff that we shouldn't really assume for the user. The semaphore grace period is to prevent the situation where Cilium deadlocks or consistently high rate of DNS traffic causing Cilium to be unable to keep up. See cilium#19543 (comment) by <joe@cilium.io>. The user can take into account the rate that they expect DNS requests to be following into Cilium and how many of those requests should be processed without retrying. If retrying isn't an issue then keeping the grace period at 0 (default) will immediately free the goroutine handling the DNS request if the semaphore acquire fails. Conversely, if a backlog of "unproductive" goroutines is acceptable (and DNS request retries are not), then setting the grace period is advisable. This gives the goroutines some time to acquire the semaphore. Goroutines could pile up if the grace period is too high and there's a consistently high rate of DNS requests. It's worth noting that blindly increasing the concurrency limit will not linearly improve performance. It might actually degrade instead due to internal downstream lock contention (as seen by the recent commits to move Endpoint-related functions to use read-locks). Ultimately, it becomes a tradeoff between high number of semaphore timeouts (dropped DNS requests that must be retried) or high number of (unproductive) goroutines, which can consume system resources. [1]: The metric to monitor is ``` cilium_policy_l7_total{rule="received"} ``` Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit c0e20dc ] The encryption-interface flag doesn't exist. It is called encrypt-interface. Fixes: 3662560f ("Add new unified Helm guide") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit fa5488b ] This value will be used during initialization of the LRU in case we don't have the value from the user. Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 8c48bc7 ] This new package will be used throughout Cilium to fetch compiled regex objects when handling toFQDN and L7 DNS rule names or patterns. The goal of this package is to provide a seamless, transparent interface to compile regexes or fetch them from the LRU. Upcoming commits will utilize this new package for all potential regex compilations in order to reduce CPU and memory consumption when compiling regexes constantly. Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 8761498 ] This reduces the potential CPU and memory spikes that were observed with constantly compiling toFQDN and L7 DNS rules, namely MatchName and MatchPattern. The spikes were observed in a high churn environment where many FQDN-related policies were being added and deleted, many of the added rules had the same MatchName and MatchPattern fields. Manual testing was shown (by taking a pprof) that with 1024 regexes present in the cache, it only consumed 10s of MB. This commit ensures that we are always caching the regex compiled objects, both for policy validation and when processing the policy / selectors. Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit eaa7141 ] This commit is to allow empty value, as well as @ character in value for key value pair validation. Fixes: cilium#19793 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 422a9df ] Ref. https://github.com/cilium/cilium/blob/dfa6b157e8e9484c65fd938b2b45a4f5f50c61f9/daemon/cmd/agenthealth.go#L25-L31 Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 22cd47e ] The default value 9876 for the agent health port introduced in commit efffbdb ("daemon: expose HTTP endpoint on localhost for health checks") conflicts with Istio's ControlZ port. The latter has been in use for longer. Because the port is only exposed on localhost for use by liveness/readiness probe, we can change the default value without breaking users. Thus, change it to 9879 for which there doesn't seem to be any documented use. Suggested-by: Joe Stringer <joe@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit e2adb8a ] Wait for pod termination before removing Cilium. The premature removal of Cilium might cause the removal of any test pods to fail. For example the following CI flake: "Pods are still terminating: [echo-694c58bbf4-896gh echo-694c58bbf4-fr4ck]" This is due to the missing CNI plugin. From the kubelet logs: failed to "KillPodSandbox" for "..." with KillPodSandboxError: "rpc error: code = Unknown desc = networkPlugin cni failed to teardown pod \"echo-694c58bbf4-fr4ck_default\" network: failed to find plugin \"cilium-cni\" in path [/opt/cni/bin]" The proposed change is not ideal, as the ExpectAllPodsInNsTerminated() function is racy. If neither of pods have not entered the termination state yet, the function will return too early (without waiting for the termination). The proper solution would be to use the deployment manager used by the K8sDatapathConfig. However, the manager would require significant changes. Considering that we are planning to completely change the integration suite, the proper solution is not worth time. Signed-off-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Chris Tarazi <chris@isovalent.com>
48ed237
to
94af85b
Compare
/test-backport-1.11 |
/ci-l4lb-1.11 |
Once this PR is merged, you can update the PR labels via: