Skip to content

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented May 18, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 19702 19759 19592 19767 19794 19830 19918; do contrib/backporting/set-labels.py $pr done 1.10; done

@christarazi christarazi requested a review from a team as a code owner May 18, 2022 06:24
@christarazi christarazi added backport/1.10 kind/backports This PR provides functionality previously merged into master. labels May 18, 2022
Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks and looks good for my changes.

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for my commits. Thanks!

@christarazi
Copy link
Member Author

/test-backport-1.10

Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My change looks good. Thanks!

@jibi jibi force-pushed the pr/v1.10-backport-2022-05-17 branch from 9bbd095 to d82683e Compare May 23, 2022 07:21
@jibi jibi closed this May 23, 2022
@jibi jibi reopened this May 23, 2022
@jibi jibi force-pushed the pr/v1.10-backport-2022-05-17 branch 2 times, most recently from 80d26b9 to 3f06205 Compare May 24, 2022 13:49
@jibi jibi closed this May 24, 2022
@jibi jibi reopened this May 24, 2022
@christarazi christarazi requested review from a team as code owners May 24, 2022 14:38
@jibi jibi closed this May 24, 2022
@jibi jibi reopened this May 24, 2022
@michi-covalent michi-covalent force-pushed the pr/v1.10-backport-2022-05-17 branch 4 times, most recently from 02b9da8 to 33fc56d Compare May 24, 2022 23:40
@jibi jibi force-pushed the pr/v1.10-backport-2022-05-17 branch from a3174c3 to 725df42 Compare May 27, 2022 11:56
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label May 27, 2022
@jibi jibi force-pushed the pr/v1.10-backport-2022-05-17 branch from e373fe4 to cf8f80a Compare May 27, 2022 12:18
@cilium cilium deleted a comment from maintainer-s-little-helper bot May 27, 2022
@cilium cilium deleted a comment from maintainer-s-little-helper bot May 27, 2022
@jibi jibi removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label May 27, 2022
@jibi jibi force-pushed the pr/v1.10-backport-2022-05-17 branch 3 times, most recently from 4bd7744 to 095c4ea Compare May 30, 2022 10:29
@jibi jibi marked this pull request as ready for review May 30, 2022 10:29
christarazi and others added 8 commits May 30, 2022 16:27
[ upstream commit 6b57b52 ]

This commit fixes the error wrapping inside the dnsproxy package.

When a DNS response is being processed by the DNS proxy inside
NotifyOnDNSMsg(), we check ProxyRequestContext. If the request response
timed out, we annotate the metrics specifically to indicate that it
timed out. This relies on the errors being properly wrapped.

In order to do this, errors.As() is used and all errors are properly
wrapped with '%w'.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 35f2ff3 ]

Before this patch, the hubble-peer Service would be deployed during
preflight check, which will in turn prevent Cilium to be installed as it
would attempt to install it again.

Suggested-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit f481981 ]

This commit does the following:

* Adds a configuration option for controlling the concurrency of the DNS
  proxy
* Adds a configuration option for the semaphore (from above) timeout
* Exposes an additional metric for the time taken to perform the policy
  check on a DNS request within the DNS proxy

The concurrency limitation is done by introducing a semaphore to DNS
proxy. By default, no such limit is imposed.

Users are advised to take into account the number of DNS requests[1] and
how many CPUs on each node in their cluster in order to come up with an
appropriate concurrency limit.

In addition, we expose the semaphore grace period as a configurable
option. Assuming the "right" for this timeout is a tradeoff that we
shouldn't really assume for the user.

The semaphore grace period is to prevent the situation where Cilium
deadlocks or consistently high rate of DNS traffic causing Cilium to be
unable to keep up.

See cilium#19543 (comment) by
<joe@cilium.io>.

The user can take into account the rate that they expect DNS requests to
be following into Cilium and how many of those requests should be
processed without retrying. If retrying isn't an issue then keeping the
grace period at 0 (default) will immediately free the goroutine handling
the DNS request if the semaphore acquire fails. Conversely, if a backlog
of "unproductive" goroutines is acceptable (and DNS request retries are
not), then setting the grace period is advisable. This gives the
goroutines some time to acquire the semaphore. Goroutines could pile up
if the grace period is too high and there's a consistently high rate of
DNS requests.

It's worth noting that blindly increasing the concurrency limit will not
linearly improve performance. It might actually degrade instead due to
internal downstream lock contention (as seen by the recent commits to
move Endpoint-related functions to use read-locks).

Ultimately, it becomes a tradeoff between high number of semaphore
timeouts (dropped DNS requests that must be retried) or high number of
(unproductive) goroutines, which can consume system resources.

[1]: The metric to monitor is

```
cilium_policy_l7_total{rule="received"}
```

Co-authored-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Maciej Kwiek <maciej@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit c0e20dc ]

The encryption-interface flag doesn't exist. It is called
encrypt-interface.

Fixes: 3662560f ("Add new unified Helm guide")
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit eaa7141 ]

This commit is to allow empty value, as well as @ character in value for
key value pair validation.

Fixes: cilium#19793
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 422a9df ]

Ref. https://github.com/cilium/cilium/blob/dfa6b157e8e9484c65fd938b2b45a4f5f50c61f9/daemon/cmd/agenthealth.go#L25-L31

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 22cd47e ]

The default value 9876 for the agent health port introduced in commit
efffbdb ("daemon: expose HTTP endpoint on localhost for health
checks") conflicts with Istio's ControlZ port. The latter has been in
use for longer. Because the port is only exposed on localhost for use by
liveness/readiness probe, we can change the default value without
breaking users. Thus, change it to 9879 for which there doesn't seem to
be any documented use.

Suggested-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
The agent health check port will be changed for v1.10.12, ref.
cilium/cilium-cli#869. Set the base version
explicitly to already use the correct port in PRs in preparation for
v1.10.

Signed-off-by: Tobias Klauser <tobias@cilium.io>
@jibi jibi force-pushed the pr/v1.10-backport-2022-05-17 branch from 095c4ea to f69a73e Compare May 30, 2022 14:27
[ upstream commit 6260073 ]

The general idea is to remove the need for our permanent pool of GKE
clusters + management cluster (that manages the pool via Config
Connector).

Instead, we switch to ad-hoc clusters as we do on CI 3.0. This should:

- Remove the upper limit on the number of concurrent Jenkins GKE jobs.
- Remove the need for permanent clusters (reduce CI costs).
- Have no effect on the setup time required before the tests actually
  start running on GKE clusters.
- Improve control over GKE features (e.g. `DenyServiceExternalIPs`
  admission controller) that cannot be controlled via CNRM /
  Config Connector.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit b42e5a0 ]

New GKE clusters have the automatic labelling feature gate enabled by
default, so the labels used in the `Identity CLI testing` `K8sCLI` test
need to be updated with the additional
`k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name`
automatic label.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Signed-off-by: Gilberto Bertin <jibi@cilium.io>
@jibi jibi requested a review from nbusseneau May 30, 2022 16:06
Copy link
Member

@nbusseneau nbusseneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commits LGTM.

@jibi
Copy link
Member

jibi commented May 30, 2022

/test-1.19-5.4

@jibi
Copy link
Member

jibi commented May 30, 2022

/test-gke

@jibi
Copy link
Member

jibi commented May 30, 2022

CI is green, beside ci-gke-1.10 failing to clean up the cluster:

⌛ Waiting for Cilium to be uninstalled...
⌛ Waiting for cilium-test namespace to be terminated...
Error: The operation was canceled.

@jibi jibi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 30, 2022
@jibi jibi merged commit 3d481ff into cilium:v1.10 May 30, 2022
@christarazi christarazi deleted the pr/v1.10-backport-2022-05-17 branch May 30, 2022 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants