v1.10 backports 2022-05-17 #19859

christarazi · 2022-05-18T06:24:26Z

fqdn/dnsproxy: Improve error wrapping #19702 -- fqdn/dnsproxy: Improve error wrapping (@christarazi)
helm: don't generate the hubble-peer svc during preflight checks #19759 -- helm: don't generate the hubble-peer svc during preflight checks (@kaworu)
Add concurrency limiting for DNS message processing #19592 -- Add concurrency limiting for DNS message processing (@nebril)
docs: Fix incorrect command in IPsec GSG #19767 -- docs: Fix incorrect command in IPsec GSG (@pchaigno)
cli: Update regex for key value validation #19794 -- cli: Update regex for key value validation (@sayboras)
Change default agent health check port to avoid conflicts #19830 -- Change default agent health check port to avoid conflicts (@tklauser)
jenkins: switch to ad-hoc GKE cluster creation/deletion #19918 -- jenkins: switch to ad-hoc GKE cluster creation/deletion (@nbusseneau)

Once this PR is merged, you can update the PR labels via:

$ for pr in 19702 19759 19592 19767 19794 19830 19918; do contrib/backporting/set-labels.py $pr done 1.10; done

sayboras

thanks and looks good for my changes.

tklauser

Looks good for my commits. Thanks!

christarazi · 2022-05-18T06:41:57Z

/test-backport-1.10

pchaigno

My change looks good. Thanks!

[ upstream commit 6b57b52 ] This commit fixes the error wrapping inside the dnsproxy package. When a DNS response is being processed by the DNS proxy inside NotifyOnDNSMsg(), we check ProxyRequestContext. If the request response timed out, we annotate the metrics specifically to indicate that it timed out. This relies on the errors being properly wrapped. In order to do this, errors.As() is used and all errors are properly wrapped with '%w'. Signed-off-by: Chris Tarazi <chris@isovalent.com>

[ upstream commit 35f2ff3 ] Before this patch, the hubble-peer Service would be deployed during preflight check, which will in turn prevent Cilium to be installed as it would attempt to install it again. Suggested-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Alexandre Perrin <alex@kaworu.ch> Signed-off-by: Chris Tarazi <chris@isovalent.com>

[ upstream commit f481981 ] This commit does the following: * Adds a configuration option for controlling the concurrency of the DNS proxy * Adds a configuration option for the semaphore (from above) timeout * Exposes an additional metric for the time taken to perform the policy check on a DNS request within the DNS proxy The concurrency limitation is done by introducing a semaphore to DNS proxy. By default, no such limit is imposed. Users are advised to take into account the number of DNS requests[1] and how many CPUs on each node in their cluster in order to come up with an appropriate concurrency limit. In addition, we expose the semaphore grace period as a configurable option. Assuming the "right" for this timeout is a tradeoff that we shouldn't really assume for the user. The semaphore grace period is to prevent the situation where Cilium deadlocks or consistently high rate of DNS traffic causing Cilium to be unable to keep up. See cilium#19543 (comment) by <joe@cilium.io>. The user can take into account the rate that they expect DNS requests to be following into Cilium and how many of those requests should be processed without retrying. If retrying isn't an issue then keeping the grace period at 0 (default) will immediately free the goroutine handling the DNS request if the semaphore acquire fails. Conversely, if a backlog of "unproductive" goroutines is acceptable (and DNS request retries are not), then setting the grace period is advisable. This gives the goroutines some time to acquire the semaphore. Goroutines could pile up if the grace period is too high and there's a consistently high rate of DNS requests. It's worth noting that blindly increasing the concurrency limit will not linearly improve performance. It might actually degrade instead due to internal downstream lock contention (as seen by the recent commits to move Endpoint-related functions to use read-locks). Ultimately, it becomes a tradeoff between high number of semaphore timeouts (dropped DNS requests that must be retried) or high number of (unproductive) goroutines, which can consume system resources. [1]: The metric to monitor is ``` cilium_policy_l7_total{rule="received"} ``` Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Maciej Kwiek <maciej@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>

[ upstream commit c0e20dc ] The encryption-interface flag doesn't exist. It is called encrypt-interface. Fixes: 3662560f ("Add new unified Helm guide") Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>

[ upstream commit eaa7141 ] This commit is to allow empty value, as well as @ character in value for key value pair validation. Fixes: cilium#19793 Signed-off-by: Tam Mach <tam.mach@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>

[ upstream commit 422a9df ] Ref. https://github.com/cilium/cilium/blob/dfa6b157e8e9484c65fd938b2b45a4f5f50c61f9/daemon/cmd/agenthealth.go#L25-L31 Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>

[ upstream commit 22cd47e ] The default value 9876 for the agent health port introduced in commit efffbdb ("daemon: expose HTTP endpoint on localhost for health checks") conflicts with Istio's ControlZ port. The latter has been in use for longer. Because the port is only exposed on localhost for use by liveness/readiness probe, we can change the default value without breaking users. Thus, change it to 9879 for which there doesn't seem to be any documented use. Suggested-by: Joe Stringer <joe@cilium.io> Signed-off-by: Tobias Klauser <tobias@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>

The agent health check port will be changed for v1.10.12, ref. cilium/cilium-cli#869. Set the base version explicitly to already use the correct port in PRs in preparation for v1.10. Signed-off-by: Tobias Klauser <tobias@cilium.io>

[ upstream commit 6260073 ] The general idea is to remove the need for our permanent pool of GKE clusters + management cluster (that manages the pool via Config Connector). Instead, we switch to ad-hoc clusters as we do on CI 3.0. This should: - Remove the upper limit on the number of concurrent Jenkins GKE jobs. - Remove the need for permanent clusters (reduce CI costs). - Have no effect on the setup time required before the tests actually start running on GKE clusters. - Improve control over GKE features (e.g. `DenyServiceExternalIPs` admission controller) that cannot be controlled via CNRM / Config Connector. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

[ upstream commit b42e5a0 ] New GKE clusters have the automatic labelling feature gate enabled by default, so the labels used in the `Identity CLI testing` `K8sCLI` test need to be updated with the additional `k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name` automatic label. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>

nbusseneau

My commits LGTM.

jibi · 2022-05-30T17:36:03Z

/test-1.19-5.4

jibi · 2022-05-30T20:04:08Z

/test-gke

jibi · 2022-05-30T21:48:07Z

CI is green, beside ci-gke-1.10 failing to clean up the cluster:

⌛ Waiting for Cilium to be uninstalled...
⌛ Waiting for cilium-test namespace to be terminated...
Error: The operation was canceled.

christarazi requested a review from a team as a code owner May 18, 2022 06:24

christarazi added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels May 18, 2022

christarazi requested review from kaworu, nebril, pchaigno, sayboras and tklauser May 18, 2022 06:24

sayboras approved these changes May 18, 2022

View reviewed changes

tklauser approved these changes May 18, 2022

View reviewed changes

pchaigno reviewed May 18, 2022

View reviewed changes

nebril approved these changes May 18, 2022

View reviewed changes

pchaigno approved these changes May 18, 2022

View reviewed changes

kaworu approved these changes May 18, 2022

View reviewed changes

jibi force-pushed the pr/v1.10-backport-2022-05-17 branch from 9bbd095 to d82683e Compare May 23, 2022 07:21

jibi closed this May 23, 2022

jibi reopened this May 23, 2022

jibi force-pushed the pr/v1.10-backport-2022-05-17 branch 2 times, most recently from 80d26b9 to 3f06205 Compare May 24, 2022 13:49

jibi closed this May 24, 2022

jibi reopened this May 24, 2022

christarazi requested review from a team as code owners May 24, 2022 14:38

jibi closed this May 24, 2022

jibi reopened this May 24, 2022

michi-covalent force-pushed the pr/v1.10-backport-2022-05-17 branch 4 times, most recently from 02b9da8 to 33fc56d Compare May 24, 2022 23:40

jibi force-pushed the pr/v1.10-backport-2022-05-17 branch from a3174c3 to 725df42 Compare May 27, 2022 11:56

maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label May 27, 2022

jibi force-pushed the pr/v1.10-backport-2022-05-17 branch from e373fe4 to cf8f80a Compare May 27, 2022 12:18

cilium deleted a comment from maintainer-s-little-helper bot May 27, 2022

jibi removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label May 27, 2022

jibi force-pushed the pr/v1.10-backport-2022-05-17 branch 3 times, most recently from 4bd7744 to 095c4ea Compare May 30, 2022 10:29

jibi marked this pull request as ready for review May 30, 2022 10:29

christarazi and others added 8 commits May 30, 2022 16:27

jibi force-pushed the pr/v1.10-backport-2022-05-17 branch from 095c4ea to f69a73e Compare May 30, 2022 14:27

nbusseneau added 2 commits May 30, 2022 18:05

jibi requested a review from nbusseneau May 30, 2022 16:06

nbusseneau approved these changes May 30, 2022

View reviewed changes

jibi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 30, 2022

jibi merged commit 3d481ff into cilium:v1.10 May 30, 2022

christarazi deleted the pr/v1.10-backport-2022-05-17 branch May 30, 2022 21:50

joestringer mentioned this pull request Jun 10, 2022

Prepare for release v1.10.12 #20168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v1.10 backports 2022-05-17 #19859

v1.10 backports 2022-05-17 #19859

Uh oh!

christarazi commented May 18, 2022 •

edited by jibi

Loading

Uh oh!

sayboras left a comment

Uh oh!

tklauser left a comment

Uh oh!

christarazi commented May 18, 2022

Uh oh!

pchaigno left a comment

Uh oh!

nbusseneau left a comment

Uh oh!

jibi commented May 30, 2022

Uh oh!

jibi commented May 30, 2022

Uh oh!

jibi commented May 30, 2022

Uh oh!

Uh oh!

v1.10 backports 2022-05-17 #19859

v1.10 backports 2022-05-17 #19859

Uh oh!

Conversation

christarazi commented May 18, 2022 • edited by jibi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayboras left a comment

Choose a reason for hiding this comment

Uh oh!

tklauser left a comment

Choose a reason for hiding this comment

Uh oh!

christarazi commented May 18, 2022

Uh oh!

pchaigno left a comment

Choose a reason for hiding this comment

Uh oh!

nbusseneau left a comment

Choose a reason for hiding this comment

Uh oh!

jibi commented May 30, 2022

Uh oh!

jibi commented May 30, 2022

Uh oh!

jibi commented May 30, 2022

Uh oh!

Uh oh!

christarazi commented May 18, 2022 •

edited by jibi

Loading