Skip to content

v1.11 backports 2022-06-08 #20111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jun 9, 2022
Merged

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Jun 8, 2022

Once this PR is merged, you can update the PR labels via:

$ for pr in 18667 19809 19857 19955 19875 19565 19987 19933 20047 19843 19925 19959 19999 19968 20011 19998 19923; do contrib/backporting/set-labels.py $pr done 1.11; done

or with

$ make add-label BRANCH=v1.11 ISSUES=18667,19809,19857,19955,19875,19565,19987,19933,20047,19843,19925,19959,19999,19968,20011,19998,19923

sdake and others added 23 commits June 8, 2022 12:38
[ upstream commit b2e3a25 ]

An annotation, deprecated in Kubernetes 1.16 and no longer functional,
was used to ensure priority scheduling.

Signed-off-by: Steven Dake <steven.dake@gmail.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 2926892 ]

Signed-off-by: Vlad Ungureanu <ungureanuvladvictor@gmil.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 95362de ]

This commit adds a new convenience functions to get a count of
 * how many entries are inside the DNS cache (IPs)
 * and how many FQDNs are inside the DNS cache
It will be used by upcoming commits to expose these values as a metrics.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit d9ae1c4 ]

This commit contains no functional changes and is only cosmetic to ease
future commits when adding new FQDN-related metrics.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 47870fe ]

This commit exposes new metrics that show the number of active names and
IPs in the DNS cache, and number of alive FQDN connections that have
expired (aka zombies) per endpoint. This is useful to track the
endpoint's DNS cache and DNS zombie cache sizes over time.

Note that these metrics are only updated during the FQDN GC which is
currently invoked every minute.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 070ded0 ]

The previous PR cilium#18478 wraps existing viper GetStringMapString function
to get around upstream bugs, however, it's unintentionally restricted a
few formats, which supported before in cilium, such as:

- --aws-instance-limit-mapping=c6a.2xlarge=4,15,15,m4.xlarge=1,5,10
- --api-rate-limit=endpoint-create=rate-limit:10/s,rate-burst:10,parallel-requests:10,auto-adjust:true

For complicated attribute, we are allowing comma character in value part
of key value pair. As golang didn't support look-ahead functionalities in
built-in regex library, this commit is to replace string.Split function
by custom implementation to handle such scenario.

Relates: cilium#18478
Fixes: cilium#18973
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 2e0d591 ]

Fixes: 8761498 ("fqdn: Use new regex LRU package everywhere")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit cb6c554 ]

Previously we used envoy proxy container to convert grpc-web traffic to
true grpc traffic, so ui backend accepts real grpc traffic. There is
an alternative approach to use special wrapper
for grpc service on backend:
https://github.com/improbable-eng/grpc-web/tree/master/go/grpcweb

Related code changes on hubble-ui itself was implemented in this PR:
cilium/hubble-ui#226

Signed-off-by: Dmitry Kharitonov <dmitry@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 044681a ]

This commit is to bump prometheus client library to the latest (e.g. v1.12.2)
which will have a better support for go collector of different go versions,
and fix NaN value.

Fixes: cilium#19985

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 7c86fe7 ]

When the service port for the peer service is not specified, it is
automatically assigned port 80 or port 443 (when TLS is enabled). Using
these ports make it easy to understand whether TLS is enabled for the
service or not. Moreover, it makes the behavior consistent with the
Hubble Relay service.

Signed-off-by: Robin Hahling <robin.hahling@gw-computing.net>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9580ea3 ]

Signed-off-by: Alexandre Perrin <alex@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 9242f05 ]

This will come in handy for upcoming commits that will try to improve
the performance of these functions.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 189d4d3 ]

Use strings.Builder instead of fmt.Sprintf() which is known to be not
performant.

This optimization was identified by inspecting a pprof (memory profile)
of a cluster with a wildcard L7 DNS rule policy (matchPattern: '*') and
a workload selected by the policy making a large amount of unique DNS
requests.

```
$ go test -v -run '^$' -bench 'Benchmark_maskedIPNetToLabelString' -benchtime 50000x -benchmem ./pkg/labels/cidr > old.txt
$ go test -v -run '^$' -bench 'Benchmark_maskedIPNetToLabelString' -benchtime 50000x -benchmem ./pkg/labels/cidr > new.txt
$ benchcmp old.txt new.txt
benchcmp is deprecated in favor of benchstat:
https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                                old ns/op     new ns/op     delta
Benchmark_maskedIPNetToLabelString-8     8122          4450          -45.21%

benchmark                                old allocs     new allocs     delta
Benchmark_maskedIPNetToLabelString-8     39             32             -17.95%

benchmark                                old bytes     new bytes     delta
Benchmark_maskedIPNetToLabelString-8     488           368           -24.59%
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 46e3d07 ]

Avoid using fmt.Sprintf() when a simple string concatentation does the
same job, and preallocate slices where possible.

These optimizations were identified by inspecting a pprof (memory
profile) of a cluster with a wildcard L7 DNS rule policy (matchPattern:
'*') and a workload selected by the policy making a large amount of
unique DNS requests.

Without the previous commit that optimizes the maskedIPToLabelString():

```
$ go test -v -run '^$' -bench 'Benchmark_GetCIDRLabels' -benchtime 5000x -benchmem ./pkg/labels/cidr > old.txt
$ go test -v -run '^$' -bench 'Benchmark_GetCIDRLabels' -benchtime 5000x -benchmem ./pkg/labels/cidr > new.txt
$ benchcmp old.txt new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                     old ns/op     new ns/op     delta
Benchmark_GetCIDRLabels-8     391590        390163        -0.36%

benchmark                     old allocs     new allocs     delta
Benchmark_GetCIDRLabels-8     1455           1424           -2.13%

benchmark                     old bytes     new bytes     delta
Benchmark_GetCIDRLabels-8     64143         63189         -1.49%
```

With previous commit:

```
benchmark                     old ns/op     new ns/op     delta
Benchmark_GetCIDRLabels-8     390163        295259        -24.32%

benchmark                     old allocs     new allocs     delta
Benchmark_GetCIDRLabels-8     1424           1131           -20.58%

benchmark                     old bytes     new bytes     delta
Benchmark_GetCIDRLabels-8     63189         58707         -7.09%
```

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit f439177 ]

This commit fixes a bug where the keys of the forward map inside the DNS
cache were never removed, causing the map to grow forever. By contrast,
the reverse map keys were being deleted.

For both the forward and reverse maps (which are both maps whose values
are another map), the inner map keys were being deleted.

In other words, the delete on the outer map key was missing for the
forward map.

In addition to fixing the bug, this commit expands the unit test
coverage to assert after any deletes (entries expiring or GC) that the
forward and reverse maps contain what we expect.

Particularly, in an environment where there are many unique DNS lookups
(unique FQDNs) being done, this forward map could grow quite large over
time, especially for a long-lived workload (endpoint). This fixes this
memory-leak-like bug.

Fixes: cf387ce ("fqdn: Introduce TTL-aware cache for DNS retention")
Fixes: f6ce522 ("FQDN: Added garbage collector functions.")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 3500290 ]

This commit is to add ownerReferences for CiliumNodes created by
CiliumExternalWorkload, so that we don't unintentionally GC invalid CN.

Thanks to @nathanejohnson for reporting this issue.

Fixes: cilium#19907
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit f9863ec ]

Signed-off-by: Vlad Ungureanu <ungureanuvladvictor@gmail.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit aa7572b ]

Signed-off-by: Vlad Ungureanu <ungureanuvladvictor@gmail.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 2d35b95 ]

Out of all the flags having string map type, only these two flags are
StringToStringVar type, which failed validation if the passed value is
json format.

```
2022-05-26T12:03:33.817769447Z level=fatal msg="Incorrect config-map flag subnet-tags-filter" error="{\"type\":\"private\"} must be formatted as key=value" subsys=config
```

Relates: cilium#16014
Fixes: cilium#19961
Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit c6af580 ]

This commit adds the `-o json` output to `cilium node list` and
`cilium-health status`, as the text version of both does not contain all
details.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 1873163 ]

When the host firewall is enabled, we start tracking (and potentially
enforcing policies) on all connections to and from the host IP addresses.
Thus, we also need to avoid GCing the host IPs. If we don't it can cause
established connections to be broken on agent restart.

Reported-by: Andrey Klimentyev <andrey.klimentyev@flant.com>
Reported-by: Jarno Rajahalme <jarno@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 6a1f175 ]

Currently, when the cilium-operator attaches new ENIs to a node, we
update the corresponding CiliumNode in two steps: first the .Status,
then the .Spec [1]. That can result in an inconsistent state, where the
CiliumNode .Spec.IPAM.Pool contains new IP addresses associated with the
new ENI, while .Status.ENI.ENIs is still missing the ENI. This
inconsistency manisfests as a fatal:

    level=fatal msg="Error while creating daemon" error="Unable to allocate router IP for family ipv4: failed to associate IP 10.12.14.5 inside CiliumNode: unable to find ENI eni-9ab538c64feb9f59e" subsys=daemon

This inconsistency occurs because the following can happen:
1. cilium-operator attaches a new ENI to the CiliumNode.
2. Still at cilium-operator, .Spec is synced with kube-apiserver. The IP
   pool is updated with a new set of IP addresses and the new ENI.
3. The agent receives this half-updated CiliumNode.
4. It allocates an IP address for the router from the pool of IPs
   attached to the new ENI, using .Spec.IPAM.Pool.
5. It fails because the new ENI is not listed in the .Status.ENI.ENIs of
   the CiliumNode object.
6. At cilium-operator, .Status is updated with the new ENI.

But wait, you said .Status is updated before .Spec in the function you
linked? Yes, but we read the state to populate CiliumNode from two
separate places (n.ops.manager.instances and n.available) in the
syncToAPIServer function and we don't have anything to prevent having a
half updated (one place only) state in the middle of the update function.
We lock twice, once for each place, instead of once for the while
CiliumNode update. So having a half updated state in the middle of the
function would technically be the same as updating .Spec first and
.Status second.

We can fix this by first creating a snapshot of the pool first, then
write the .Status metadata (which may be more recent than the pool
snapshot, which is safe, see comment in the source code of this patch),
and then write the pool to .Spec. This ensures that the .Status is
always updated before .Spec, but at the same time also ensures that
.Status is still more recent than .Spec.

1 - https://github.com/cilium/cilium/blob/v1.12.0-rc2/pkg/ipam/node.go#L966-L1012

Co-authored-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit eac0dee ]

The `node.Spec.IPAM.Pool` value is always overwritten after the removed
`if` statement, so there is no need to initialize it.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki requested a review from a team June 8, 2022 13:03
@joamaki joamaki requested review from a team as code owners June 8, 2022 13:03
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.11 kind/backports This PR provides functionality previously merged into master. labels Jun 8, 2022
Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for my commit, also I have checked this one f0c54cf as well.

Thanks 💯

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commits looks good, thanks!

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My patch looks good, thanks!

@joamaki
Copy link
Contributor Author

joamaki commented Jun 8, 2022

/test-backport-1.11

Copy link
Contributor

@geakstr geakstr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, my commit looks good!

Copy link
Member

@ungureanuvladvictor ungureanuvladvictor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commits look good, thanks for handling this!

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@nebril
Copy link
Member

nebril commented Jun 9, 2022

/ci-aks-1.11

@nebril
Copy link
Member

nebril commented Jun 9, 2022

Looks like GKE nodes got preemted, retrying

@nebril
Copy link
Member

nebril commented Jun 9, 2022

/test-gke

@nebril nebril merged commit 76f9b4c into cilium:v1.11 Jun 9, 2022
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PRs look good. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.