-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v1.10 backports 2022-05-17 #19859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.10 backports 2022-05-17 #19859
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks and looks good for my changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for my commits. Thanks!
/test-backport-1.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My change looks good. Thanks!
9bbd095
to
d82683e
Compare
80d26b9
to
3f06205
Compare
02b9da8
to
33fc56d
Compare
a3174c3
to
725df42
Compare
e373fe4
to
cf8f80a
Compare
4bd7744
to
095c4ea
Compare
[ upstream commit 6b57b52 ] This commit fixes the error wrapping inside the dnsproxy package. When a DNS response is being processed by the DNS proxy inside NotifyOnDNSMsg(), we check ProxyRequestContext. If the request response timed out, we annotate the metrics specifically to indicate that it timed out. This relies on the errors being properly wrapped. In order to do this, errors.As() is used and all errors are properly wrapped with '%w'. Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 35f2ff3 ] Before this patch, the hubble-peer Service would be deployed during preflight check, which will in turn prevent Cilium to be installed as it would attempt to install it again. Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit f481981 ] This commit does the following: * Adds a configuration option for controlling the concurrency of the DNS proxy * Adds a configuration option for the semaphore (from above) timeout * Exposes an additional metric for the time taken to perform the policy check on a DNS request within the DNS proxy The concurrency limitation is done by introducing a semaphore to DNS proxy. By default, no such limit is imposed. Users are advised to take into account the number of DNS requests[1] and how many CPUs on each node in their cluster in order to come up with an appropriate concurrency limit. In addition, we expose the semaphore grace period as a configurable option. Assuming the "right" for this timeout is a tradeoff that we shouldn't really assume for the user. The semaphore grace period is to prevent the situation where Cilium deadlocks or consistently high rate of DNS traffic causing Cilium to be unable to keep up. See cilium#19543 (comment) by <joe@cilium.io>. The user can take into account the rate that they expect DNS requests to be following into Cilium and how many of those requests should be processed without retrying. If retrying isn't an issue then keeping the grace period at 0 (default) will immediately free the goroutine handling the DNS request if the semaphore acquire fails. Conversely, if a backlog of "unproductive" goroutines is acceptable (and DNS request retries are not), then setting the grace period is advisable. This gives the goroutines some time to acquire the semaphore. Goroutines could pile up if the grace period is too high and there's a consistently high rate of DNS requests. It's worth noting that blindly increasing the concurrency limit will not linearly improve performance. It might actually degrade instead due to internal downstream lock contention (as seen by the recent commits to move Endpoint-related functions to use read-locks). Ultimately, it becomes a tradeoff between high number of semaphore timeouts (dropped DNS requests that must be retried) or high number of (unproductive) goroutines, which can consume system resources. [1]: The metric to monitor is ``` cilium_policy_l7_total{rule="received"} ``` Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit c0e20dc ] The encryption-interface flag doesn't exist. It is called encrypt-interface. Fixes: 3662560f ("Add new unified Helm guide") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit eaa7141 ] This commit is to allow empty value, as well as @ character in value for key value pair validation. Fixes: cilium#19793 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 422a9df ] Ref. https://github.com/cilium/cilium/blob/dfa6b157e8e9484c65fd938b2b45a4f5f50c61f9/daemon/cmd/agenthealth.go#L25-L31 Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 22cd47e ] The default value 9876 for the agent health port introduced in commit efffbdb ("daemon: expose HTTP endpoint on localhost for health checks") conflicts with Istio's ControlZ port. The latter has been in use for longer. Because the port is only exposed on localhost for use by liveness/readiness probe, we can change the default value without breaking users. Thus, change it to 9879 for which there doesn't seem to be any documented use. Suggested-by: Joe Stringer <joe@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
The agent health check port will be changed for v1.10.12, ref. cilium/cilium-cli#869. Set the base version explicitly to already use the correct port in PRs in preparation for v1.10. Signed-off-by: Tobias Klauser <tobias@cilium.io>
095c4ea
to
f69a73e
Compare
[ upstream commit 6260073 ] The general idea is to remove the need for our permanent pool of GKE clusters + management cluster (that manages the pool via Config Connector). Instead, we switch to ad-hoc clusters as we do on CI 3.0. This should: - Remove the upper limit on the number of concurrent Jenkins GKE jobs. - Remove the need for permanent clusters (reduce CI costs). - Have no effect on the setup time required before the tests actually start running on GKE clusters. - Improve control over GKE features (e.g. `DenyServiceExternalIPs` admission controller) that cannot be controlled via CNRM / Config Connector. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit b42e5a0 ] New GKE clusters have the automatic labelling feature gate enabled by default, so the labels used in the `Identity CLI testing` `K8sCLI` test need to be updated with the additional `k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name` automatic label. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commits LGTM.
/test-1.19-5.4 |
/test-gke |
CI is green, beside
|
Once this PR is merged, you can update the PR labels via: