Skip to content

v1.13 Backports 2023-10-30 #28877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Nov 2, 2023

Conversation

pippolo84
Copy link
Member

@pippolo84 pippolo84 commented Oct 30, 2023

  • ⚠️ conflicts due to cidr.go moved in main branch
  • ⚠️ conflicts due to cidr.go moved in main branch
  • ⚠️ minor conflicts due to nat.IPv4 constant missing and different return types for natMap.OpenOrCreate used in tests

Once this PR is merged, a GitHub action will update the labels of these PRs:

 28465 27923 28767 28790 28788 28857

@pippolo84 pippolo84 added kind/backports This PR provides functionality previously merged into master. backport/1.13 labels Oct 30, 2023
@pippolo84 pippolo84 force-pushed the pr/v1.13-backport-2023-10-30 branch from 41bfd38 to 1323e9c Compare October 30, 2023 18:01
@pippolo84
Copy link
Member Author

  • @julianwiedmann please review your commits very carefully, I hit some conflicts during the backporting.
  • @learnitall I wasn't able to complete the backporting, the main reason is that in v1.13 there isn't a separate file for ingress_class.go and I didn't feel comfortable solving all of the conflicts.

Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commit looks good. Thanks!

@learnitall
Copy link
Contributor

Ah @pippolo84, my apologies, my PR shouldn't be backported to 1.13. The commit introducing IngressClass support wasn't backported to 1.13 so my PR is inapplicable. Sorry for the trouble!

Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @pippolo84 !

joamaki and others added 19 commits October 31, 2023 10:43
[ upstream commit e0f6c47 ]

Cache the computation of intermediate CIDR labels to speed up
GetCIDRLabels and reduce memory usage by deduplicating CIDR strings.
Even though now most of the cost is in building up the resulting
"labels.Labels", it is not memoized yet as it is mutable and mutated by
e.g. MergeLabels.

Before:
goos: linux
goarch: amd64
pkg: github.com/cilium/cilium/pkg/labels/cidr
cpu: AMD Ryzen 9 5950X 16-Core Processor
BenchmarkGetCIDRLabels/0.0.0.0/0                 6005072               199.4 ns/op           640 B/op          3 allocs/op
BenchmarkGetCIDRLabels/10.16.0.0/16               402415              2876 ns/op            3748 B/op         38 allocs/op
BenchmarkGetCIDRLabels/192.0.2.3/32               216280              5457 ns/op            8032 B/op         70 allocs/op
BenchmarkGetCIDRLabels/192.0.2.3/24               285751              4113 ns/op            5056 B/op         54 allocs/op
BenchmarkGetCIDRLabels/192.0.2.0/24               286141              4116 ns/op            5055 B/op         54 allocs/op
BenchmarkGetCIDRLabels/::/0                      6016551               199.6 ns/op           640 B/op          3 allocs/op
BenchmarkGetCIDRLabels/fdff::ff/128                37502             31938 ns/op           30786 B/op        450 allocs/op
BenchmarkGetCIDRLabels/f00d:42::ff/128             35725             33607 ns/op           33658 B/op        450 allocs/op
BenchmarkGetCIDRLabels/f00d:42::ff/96              50270             23798 ns/op           20231 B/op        297 allocs/op

After:
goos: linux
goarch: amd64
pkg: github.com/cilium/cilium/pkg/labels/cidr
cpu: AMD Ryzen 9 5950X 16-Core Processor
BenchmarkGetCIDRLabels/0.0.0.0/0                 7320565               164.0 ns/op           624 B/op          2 allocs/op
BenchmarkGetCIDRLabels/10.16.0.0/16              1000000              1083 ns/op            2396 B/op          2 allocs/op
BenchmarkGetCIDRLabels/192.0.2.3/32               593683              1948 ns/op            5008 B/op          2 allocs/op
BenchmarkGetCIDRLabels/192.0.2.3/24               337100              3498 ns/op            7728 B/op          3 allocs/op
BenchmarkGetCIDRLabels/192.0.2.0/24               793645              1427 ns/op            2767 B/op          2 allocs/op
BenchmarkGetCIDRLabels/::/0                      7213646               166.1 ns/op           624 B/op          2 allocs/op
BenchmarkGetCIDRLabels/fdff::ff/128               168543              7064 ns/op           18515 B/op          3 allocs/op
BenchmarkGetCIDRLabels/f00d:42::ff/128            165129              7184 ns/op           18516 B/op          3 allocs/op
BenchmarkGetCIDRLabels/f00d:42::ff/96              91777             13056 ns/op           29283 B/op          6 allocs/op

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 3debf6d ]

To avoid excessive heap usage, limit the number of cached labels using a
LRU map. Maximum cache size is empirically set to 16384 and a lock is
used to serialize concurrent accesses to the cache.

Note that, being a LRU cache, the Get operations modify the cache
internal status, so a classic mutex has been used instead of a rwmutex.

Benchmark results to compare LRU-based memoization against the
non-memoized version:

name                             old time/op    new time/op    delta
GetCIDRLabels/0.0.0.0/0-8           204ns ± 4%     351ns ± 2%  +71.85%  (p=0.000 n=8+10)
GetCIDRLabels/10.16.0.0/16-8       4.20µs ±10%    1.73µs ± 3%  -58.84%  (p=0.000 n=10+9)
GetCIDRLabels/192.0.2.3/32-8       8.02µs ± 4%    3.27µs ± 2%  -59.25%  (p=0.000 n=9+9)
GetCIDRLabels/192.0.2.3/24-8       6.65µs ±11%    3.26µs ± 2%  -51.02%  (p=0.000 n=10+8)
GetCIDRLabels/192.0.2.0/24-8       6.52µs ±10%    2.86µs ± 1%  -56.10%  (p=0.000 n=10+9)
GetCIDRLabels/::/0-8                330ns ± 2%     354ns ± 5%   +7.28%  (p=0.000 n=9+10)
GetCIDRLabels/fdff::ff/128-8       52.9µs ± 6%    12.4µs ± 6%  -76.48%  (p=0.000 n=10+10)
GetCIDRLabels/f00d:42::ff/128-8    55.3µs ± 5%    12.6µs ± 5%  -77.27%  (p=0.000 n=10+10)
GetCIDRLabels/f00d:42::ff/96-8     41.8µs ± 7%    12.4µs ± 2%  -70.20%  (p=0.000 n=10+9)

name                             old alloc/op   new alloc/op   delta
GetCIDRLabels/0.0.0.0/0-8            656B ± 0%      624B ± 0%   -4.88%  (p=0.000 n=10+10)
GetCIDRLabels/10.16.0.0/16-8       3.17kB ± 0%    2.40kB ± 0%  -24.46%  (p=0.000 n=10+10)
GetCIDRLabels/192.0.2.3/32-8       6.88kB ± 0%    5.01kB ± 0%  -27.21%  (p=0.000 n=8+8)
GetCIDRLabels/192.0.2.3/24-8       6.32kB ± 0%    5.01kB ± 0%  -20.73%  (p=0.000 n=9+9)
GetCIDRLabels/192.0.2.0/24-8       6.32kB ± 0%    4.93kB ± 0%  -22.03%  (p=0.000 n=10+10)
GetCIDRLabels/::/0-8                 656B ± 0%      624B ± 0%   -4.88%  (p=0.000 n=10+10)
GetCIDRLabels/fdff::ff/128-8       25.9kB ± 0%    18.5kB ± 0%  -28.58%  (p=0.000 n=10+10)
GetCIDRLabels/f00d:42::ff/128-8    28.8kB ± 0%    18.5kB ± 0%  -35.70%  (p=0.000 n=10+10)
GetCIDRLabels/f00d:42::ff/96-8     25.1kB ± 0%    18.5kB ± 0%  -26.20%  (p=0.000 n=10+8)

name                             old allocs/op  new allocs/op  delta
GetCIDRLabels/0.0.0.0/0-8            3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=10+10)
GetCIDRLabels/10.16.0.0/16-8         37.0 ± 0%       2.0 ± 0%  -94.59%  (p=0.000 n=10+10)
GetCIDRLabels/192.0.2.3/32-8         69.0 ± 0%       2.0 ± 0%  -97.10%  (p=0.000 n=10+10)
GetCIDRLabels/192.0.2.3/24-8         53.0 ± 0%       2.0 ± 0%  -96.23%  (p=0.000 n=10+10)
GetCIDRLabels/192.0.2.0/24-8         53.0 ± 0%       2.0 ± 0%  -96.23%  (p=0.000 n=10+10)
GetCIDRLabels/::/0-8                 3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=10+10)
GetCIDRLabels/fdff::ff/128-8          449 ± 0%         3 ± 0%  -99.33%  (p=0.000 n=10+10)
GetCIDRLabels/f00d:42::ff/128-8       449 ± 0%         3 ± 0%  -99.33%  (p=0.000 n=10+10)
GetCIDRLabels/f00d:42::ff/96-8        295 ± 0%         3 ± 0%  -98.98%  (p=0.000 n=10+10)

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit ebfaa30 ]

Add a benchmark to estimate the heap usage of a full CIDR labels LRU
cache.

Results show that heap usage is less than ~5 MiB:

$ go test ./pkg/labels/cidr/... -run=^$ -bench="BenchmarkCIDRLabelsCacheHeapUsage" -benchtime=1x

--- BENCH: BenchmarkCIDRLabelsCacheHeapUsageIPv4-8
    cidr_test.go:396: Memoization map heap usage: 4146.02 KiB

--- BENCH: BenchmarkCIDRLabelsCacheHeapUsageIPv6-8
    cidr_test.go:438: Memoization map heap usage: 4571.74 KiB

The benchmark must be called with `-benchtime=1x` to get meaningful
values from `runtime.ReadMemStats`. For that reason, it is skipped by
default.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 85aef68 ]

Adding a LRU cache to memoize GetCIDRLabels means sharing state between
different goroutines concurrently executing GetCIDRLabels.

To measure the scalability of the LRU cache + mutex approach against the
non-cached previous version, a benchmark with increasing number of
goroutines is added.

Running the benchmark against the current version and the one without
labels memoization shows that the change gives a performance improvement
even when up to 48 goroutines compete for the exclusive access to the
LRU cache:

name                          old time/op  new time/op  delta
GetCIDRLabelsConcurrent/1-8    493µs ±31%   259µs ± 4%  -47.54%  (p=0.000 n=20+9)
GetCIDRLabelsConcurrent/2-8    889µs ±10%   474µs ± 5%  -46.69%  (p=0.000 n=19+10)
GetCIDRLabelsConcurrent/4-8   1.74ms ± 3%  0.89ms ± 4%  -49.04%  (p=0.000 n=20+10)
GetCIDRLabelsConcurrent/16-8  7.14ms ± 6%  3.77ms ± 8%  -47.20%  (p=0.000 n=18+10)
GetCIDRLabelsConcurrent/32-8  14.3ms ± 6%   7.3ms ± 2%  -48.69%  (p=0.000 n=20+9)
GetCIDRLabelsConcurrent/48-8  21.9ms ± 5%  11.1ms ± 3%  -49.24%  (p=0.000 n=19+10)

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 5be0299 ]

This commit extracts the logic that creates Envoy endpoints
(ClusterLoadAssignments) for LoadBalancing Backends into its
function.

In addition some unit tests were added.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 77e09f5 ]

Switch one of the matrix entries currently configuring vxlan tunneling
to geneve, so that we appropriately cover both protocols in combination
with clustermesh.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit c1803ba ]

This commit changes the bugtool report to collect the XFRM error
counters (i.e., /proc/net/xfrm_stat) twice instead of only once. We will
collect at the beginning and end of the bugtool collection. In that way,
there will be around 5-6 seconds between the two collections and we may
see if any counter is currently increasing.

    $ diff cilium-bugtool-cilium-7d54p-20231025-115151/cmd/cat*--proc-net-xfrm_stat.md
    5c5
    < XfrmInStateProtoError   	4
    ---
    > XfrmInStateProtoError   	6

In this example, we can easily see that the XfrmInStateProtoError is
increasing. That suggests a key rotation issue is currently ongoing (cf.
IPsec troubleshooting docs).

I tried other approaches to collect over a longer timespan. That may
allow us to see slower increases. They all end up being more complex or
messier (we'd need to collect at beginning and end of the sysdump
instead). In the end, considering this is already a fallback plan for
when customers don't collect Prometheus metrics, I think the current,
simpler approach is good enough.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 55517ea ]

The previous version of the implementation was actually computing the
labels starting from broader prefixes to narrower ones (first "/0", then
"/1" and so on).  As soon as we had a cache hit, the recursion stopped
without calculating the remaining labels for the CIDRs up to "ones".
This produced an incorrect (shorter) set of labels for a CIDR.

Also, netip.PrefixFrom(...) does not mask the internally stored address,
thus lowering the cache hit ratio even if two different CIDRs, used as
keys in the LRU cache, are equal in terms of masked address. (e.g:
"1.1.1.1/16" and "1.1.0.0/16").

So, netip.Addr.Prefix(...) is used instead.

After the fix, the performance are roughly equal (but with an increased
chance of having a cache hit). Instead, the maximum heap usage in the
worst case (LRU cache filled up with IPv6 labels) is increased 10x.

BenchmarkCIDRLabelsCacheHeapUsageIPv4
    cidr_test.go:628: Memoization map heap usage: 5483.24 KiB
BenchmarkCIDRLabelsCacheHeapUsageIPv6
    cidr_test.go:670: Memoization map heap usage: 54721.70 KiB

name                             old time/op    new time/op    delta
GetCIDRLabels/0.0.0.0/0-8           256ns ±39%     218ns ±46%     ~     (p=0.393 n=10+10)
GetCIDRLabels/10.16.0.0/16-8       1.35µs ± 3%    1.39µs ± 5%   +2.66%  (p=0.012 n=9+10)
GetCIDRLabels/192.0.2.3/32-8       2.52µs ± 2%    2.58µs ± 2%   +2.58%  (p=0.001 n=10+9)
GetCIDRLabels/192.0.2.3/24-8       2.57µs ± 1%    2.24µs ± 3%  -12.69%  (p=0.000 n=8+10)
GetCIDRLabels/192.0.2.0/24-8       2.27µs ± 4%    2.26µs ± 3%     ~     (p=0.690 n=9+8)
GetCIDRLabels/::/0-8                277ns ± 2%     278ns ± 3%     ~     (p=0.796 n=9+9)
GetCIDRLabels/fdff::ff/128-8       9.42µs ± 1%    9.34µs ± 6%     ~     (p=0.484 n=9+10)
GetCIDRLabels/f00d:42::ff/128-8    9.58µs ± 2%    9.62µs ± 7%     ~     (p=0.905 n=10+9)
GetCIDRLabels/f00d:42::ff/96-8     9.63µs ± 1%    8.45µs ± 3%  -12.27%  (p=0.000 n=8+9)
GetCIDRLabelsConcurrent/1-8         205µs ± 3%     207µs ± 3%     ~     (p=0.356 n=9+10)
GetCIDRLabelsConcurrent/2-8         385µs ± 5%     386µs ± 7%     ~     (p=0.631 n=10+10)
GetCIDRLabelsConcurrent/4-8         784µs ± 5%     780µs ± 1%     ~     (p=0.156 n=10+9)
GetCIDRLabelsConcurrent/16-8       3.24ms ± 1%    3.25ms ± 2%     ~     (p=0.529 n=10+10)
GetCIDRLabelsConcurrent/32-8       6.40ms ± 1%    6.39ms ± 3%     ~     (p=0.497 n=9+10)
GetCIDRLabelsConcurrent/48-8       9.69ms ± 1%   10.09ms ± 6%   +4.09%  (p=0.008 n=8+9)

name                             old alloc/op   new alloc/op   delta
GetCIDRLabels/0.0.0.0/0-8            624B ± 0%      624B ± 0%     ~     (all equal)
GetCIDRLabels/10.16.0.0/16-8       2.40kB ± 0%    2.40kB ± 0%     ~     (all equal)
GetCIDRLabels/192.0.2.3/32-8       5.01kB ± 0%    5.01kB ± 0%     ~     (all equal)
GetCIDRLabels/192.0.2.3/24-8       5.01kB ± 0%    4.93kB ± 0%   -1.64%  (p=0.002 n=8+10)
GetCIDRLabels/192.0.2.0/24-8       4.93kB ± 0%    4.93kB ± 0%     ~     (all equal)
GetCIDRLabels/::/0-8                 624B ± 0%      624B ± 0%     ~     (all equal)
GetCIDRLabels/fdff::ff/128-8       18.5kB ± 0%    18.5kB ± 0%     ~     (all equal)
GetCIDRLabels/f00d:42::ff/128-8    18.5kB ± 0%    18.5kB ± 0%     ~     (p=0.108 n=9+10)
GetCIDRLabels/f00d:42::ff/96-8     18.5kB ± 0%    18.5kB ± 0%   -0.06%  (p=0.000 n=10+10)
GetCIDRLabelsConcurrent/1-8         321kB ± 0%     321kB ± 0%     ~     (p=0.127 n=10+8)
GetCIDRLabelsConcurrent/2-8         641kB ± 0%     641kB ± 0%     ~     (p=0.928 n=10+10)
GetCIDRLabelsConcurrent/4-8        1.28MB ± 0%    1.28MB ± 0%     ~     (p=0.853 n=10+10)
GetCIDRLabelsConcurrent/16-8       5.13MB ± 0%    5.13MB ± 0%     ~     (p=0.739 n=10+10)
GetCIDRLabelsConcurrent/32-8       10.3MB ± 0%    10.3MB ± 0%     ~     (p=0.218 n=10+10)
GetCIDRLabelsConcurrent/48-8       15.4MB ± 0%    15.4MB ± 0%     ~     (p=0.218 n=10+10)

name                             old allocs/op  new allocs/op  delta
GetCIDRLabels/0.0.0.0/0-8            2.00 ± 0%      2.00 ± 0%     ~     (all equal)
GetCIDRLabels/10.16.0.0/16-8         2.00 ± 0%      2.00 ± 0%     ~     (all equal)
GetCIDRLabels/192.0.2.3/32-8         2.00 ± 0%      2.00 ± 0%     ~     (all equal)
GetCIDRLabels/192.0.2.3/24-8         2.00 ± 0%      2.00 ± 0%     ~     (all equal)
GetCIDRLabels/192.0.2.0/24-8         2.00 ± 0%      2.00 ± 0%     ~     (all equal)
GetCIDRLabels/::/0-8                 2.00 ± 0%      2.00 ± 0%     ~     (all equal)
GetCIDRLabels/fdff::ff/128-8         3.00 ± 0%      3.00 ± 0%     ~     (all equal)
GetCIDRLabels/f00d:42::ff/128-8      3.00 ± 0%      3.00 ± 0%     ~     (all equal)
GetCIDRLabels/f00d:42::ff/96-8       3.00 ± 0%      3.00 ± 0%     ~     (all equal)
GetCIDRLabelsConcurrent/1-8           138 ± 0%       138 ± 0%     ~     (all equal)
GetCIDRLabelsConcurrent/2-8           277 ± 0%       277 ± 0%     ~     (all equal)
GetCIDRLabelsConcurrent/4-8           555 ± 0%       555 ± 0%     ~     (p=0.248 n=10+9)
GetCIDRLabelsConcurrent/16-8        2.22k ± 0%     2.22k ± 0%     ~     (p=0.353 n=7+10)
GetCIDRLabelsConcurrent/32-8        4.44k ± 0%     4.44k ± 0%     ~     (p=0.723 n=10+10)
GetCIDRLabelsConcurrent/48-8        6.66k ± 0%     6.66k ± 0%     ~     (p=0.090 n=10+9)

Fixes: e0f6c47 ("labels/cidr: Cache GetCIDRLabels computation")

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 1b9b3fc ]

After the introduction of a LRU cache in GetCIDRLabels, the tests should
verify the labels computation both when the cache is cold but also when
it is hot.

Thus, the tests are refactored to check the returned labels twice.

Also, an additional test is added to verify that the labels stay
consistent when we call GetCIDRLabels with the following sequences of
prefixes:

1) "xxx/32", "xxx/31", ..., "xxx/0", "xxx/1", ..., "xxx/32"

2) "xxx/0", "xxx/1", ..., "xxx/32", "xxx/31", ..., "xxx/0"

Finally, InCluster tests are removed since cluster identity does not
exist anymore.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 9f2034e ]

Migrate remaining tests relying on checker to the testing package from
Go standard library.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 71b7ad5 ]

TestCIDRLabelsCacheHeapUsageIP{v4,v6} are meant to estimate the maximum
heap usage when filling up the CIDR labels LRU cache with labels derived
only from IPv4 and labels derived only from IPv6.

Since they give meaningful results only when running them with
benchtime=1x, thery are refactored to be just tests with a t.Log() to
output the heap usage statistics.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 6f1253e ]

After fixing the GetCIDRLabels implementation to generate all the labels
required for a CIDR, the heap usage of the LRU cache increased 10x in
the worst case (all IPv6 labels).  To reduce heap usage, the cache size
is halved, resulting in ~25 MiB in the IPv6 only case with roughly the
same performance.

=== RUN   TestCIDRLabelsCacheHeapUsageIPv4
    cidr_test.go:527: Memoization map heap usage: 1714.41 KiB
--- PASS: TestCIDRLabelsCacheHeapUsageIPv4 (0.67s)
=== RUN   TestCIDRLabelsCacheHeapUsageIPv6
    cidr_test.go:571: Memoization map heap usage: 26527.13 KiB
--- PASS: TestCIDRLabelsCacheHeapUsageIPv6 (0.71s)

name                             old time/op    new time/op    delta
GetCIDRLabels/0.0.0.0/0-8           198ns ±40%     238ns ±34%    ~     (p=0.325 n=10+10)
GetCIDRLabels/10.16.0.0/16-8       1.32µs ± 8%    1.33µs ± 8%    ~     (p=0.812 n=10+10)
GetCIDRLabels/192.0.2.3/32-8       2.41µs ± 3%    2.39µs ± 5%    ~     (p=0.278 n=10+9)
GetCIDRLabels/192.0.2.3/24-8       2.05µs ± 2%    2.05µs ± 1%    ~     (p=0.948 n=9+9)
GetCIDRLabels/192.0.2.0/24-8       2.05µs ± 2%    2.04µs ± 1%    ~     (p=0.797 n=9+8)
GetCIDRLabels/::/0-8                277ns ±31%     257ns ± 1%    ~     (p=0.349 n=10+8)
GetCIDRLabels/fdff::ff/128-8       9.02µs ± 6%    8.80µs ± 3%    ~     (p=0.077 n=9+9)
GetCIDRLabels/f00d:42::ff/128-8    9.40µs ± 6%    9.01µs ± 5%  -4.12%  (p=0.035 n=10+10)
GetCIDRLabels/f00d:42::ff/96-8     7.78µs ± 4%    7.58µs ± 1%  -2.59%  (p=0.011 n=8+9)
GetCIDRLabelsConcurrent/1-8         189µs ± 8%     173µs ± 3%  -8.85%  (p=0.000 n=10+9)
GetCIDRLabelsConcurrent/2-8         350µs ± 2%     346µs ± 1%  -1.05%  (p=0.001 n=8+8)
GetCIDRLabelsConcurrent/4-8         703µs ± 1%     692µs ± 1%  -1.59%  (p=0.000 n=9+9)
GetCIDRLabelsConcurrent/16-8       2.97ms ± 7%    2.91ms ± 1%    ~     (p=0.122 n=10+8)
GetCIDRLabelsConcurrent/32-8       5.81ms ± 1%    5.77ms ± 1%  -0.57%  (p=0.011 n=8+9)
GetCIDRLabelsConcurrent/48-8       8.87ms ± 6%    8.71ms ± 1%    ~     (p=0.139 n=9+8)

name                             old alloc/op   new alloc/op   delta
GetCIDRLabels/0.0.0.0/0-8            624B ± 0%      624B ± 0%    ~     (all equal)
GetCIDRLabels/10.16.0.0/16-8       2.40kB ± 0%    2.40kB ± 0%    ~     (all equal)
GetCIDRLabels/192.0.2.3/32-8       5.01kB ± 0%    5.01kB ± 0%    ~     (all equal)
GetCIDRLabels/192.0.2.3/24-8       4.93kB ± 0%    4.93kB ± 0%    ~     (all equal)
GetCIDRLabels/192.0.2.0/24-8       4.93kB ± 0%    4.93kB ± 0%    ~     (all equal)
GetCIDRLabels/::/0-8                 624B ± 0%      624B ± 0%    ~     (all equal)
GetCIDRLabels/fdff::ff/128-8       18.5kB ± 0%    18.5kB ± 0%    ~     (all equal)
GetCIDRLabels/f00d:42::ff/128-8    18.5kB ± 0%    18.5kB ± 0%    ~     (all equal)
GetCIDRLabels/f00d:42::ff/96-8     18.5kB ± 0%    18.5kB ± 0%    ~     (all equal)
GetCIDRLabelsConcurrent/1-8         321kB ± 0%     321kB ± 0%    ~     (p=0.645 n=10+10)
GetCIDRLabelsConcurrent/2-8         641kB ± 0%     641kB ± 0%    ~     (p=0.796 n=10+10)
GetCIDRLabelsConcurrent/4-8        1.28MB ± 0%    1.28MB ± 0%    ~     (p=0.353 n=10+10)
GetCIDRLabelsConcurrent/16-8       5.13MB ± 0%    5.13MB ± 0%    ~     (p=0.083 n=10+8)
GetCIDRLabelsConcurrent/32-8       10.3MB ± 0%    10.3MB ± 0%    ~     (p=0.481 n=10+10)
GetCIDRLabelsConcurrent/48-8       15.4MB ± 0%    15.4MB ± 0%    ~     (p=0.796 n=10+10)

name                             old allocs/op  new allocs/op  delta
GetCIDRLabels/0.0.0.0/0-8            2.00 ± 0%      2.00 ± 0%    ~     (all equal)
GetCIDRLabels/10.16.0.0/16-8         2.00 ± 0%      2.00 ± 0%    ~     (all equal)
GetCIDRLabels/192.0.2.3/32-8         2.00 ± 0%      2.00 ± 0%    ~     (all equal)
GetCIDRLabels/192.0.2.3/24-8         2.00 ± 0%      2.00 ± 0%    ~     (all equal)
GetCIDRLabels/192.0.2.0/24-8         2.00 ± 0%      2.00 ± 0%    ~     (all equal)
GetCIDRLabels/::/0-8                 2.00 ± 0%      2.00 ± 0%    ~     (all equal)
GetCIDRLabels/fdff::ff/128-8         3.00 ± 0%      3.00 ± 0%    ~     (all equal)
GetCIDRLabels/f00d:42::ff/128-8      3.00 ± 0%      3.00 ± 0%    ~     (all equal)
GetCIDRLabels/f00d:42::ff/96-8       3.00 ± 0%      3.00 ± 0%    ~     (all equal)
GetCIDRLabelsConcurrent/1-8           138 ± 0%       138 ± 0%    ~     (all equal)
GetCIDRLabelsConcurrent/2-8           277 ± 0%       277 ± 0%    ~     (all equal)
GetCIDRLabelsConcurrent/4-8           555 ± 0%       555 ± 0%    ~     (all equal)
GetCIDRLabelsConcurrent/16-8        2.22k ± 0%     2.22k ± 0%    ~     (p=0.176 n=10+7)
GetCIDRLabelsConcurrent/32-8        4.44k ± 0%     4.44k ± 0%    ~     (p=0.867 n=10+10)
GetCIDRLabelsConcurrent/48-8        6.66k ± 0%     6.66k ± 0%    ~     (p=0.682 n=8+10)

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 7f3888f ]

De-obfuscate some of the flags to improve readability.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit c19e447 ]

Test that when the CT entry for 192.168.61.11:38193 -> 192.168.61.12:80
is removed, then the related IN and OUT NAT entries are purged along with
it.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 5b22423 ]

DSR uses a OUT NAT entry for RevDNAT of backend replies. Prior to the
changes in cilium#22978, this NAT entry was
protected by the CT_INGRESS entry which bpf_lxc creates for the backend
connection.

Test that GC of the NAT entry works when the CT entry is removed.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 6b9e351 ]

Point out that PurgeOrphanNATEntries() is only a fallback, to purge NAT
entries that are unexpectedly no longer backed by any CT entry.

During normal operations NAT entries should get purged as part of the GC
for their specific CT entry.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 74b3f56 ]

CT entries that get created for a DSR connection by the datapath will have
the `dsr` flag set. Reflect this in the CT entries that we use for tests.

The flag currently doesn't make a difference for the GC logic, but let's
still be a bit more accurate.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit e743901 ]

We want to add more advanced handling into the GC logic, which requires
information from the actual CT entry. Let's consolidate all of the
decision making in the purgeCtEntry*() functions, so that the NAT code
doesn't need to understand all the details of how CT and NAT interact.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit c1a2d1f ]

Clarify which CT entries potentially require purging of a DSR-related NAT
entry. This reduces the risk of accidentally purging unrelated NAT entries,
and allows the GC logic to do less work.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
[ upstream commit 21072cd ]

With cilium#22978 we changed how DSR NAT
entries are managed. Instead of associating the NAT entry's lifetime with
bpf_lxc's CT_INGRESS entry, the nodeport code on the backend now creates
its own CT_EGRESS entry. When such a CT_EGRESS entry is GC'ed, we should
therefore also purge the related DSR NAT entry.

Also add a test for this case.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
@pippolo84 pippolo84 force-pushed the pr/v1.13-backport-2023-10-30 branch from 1323e9c to df18ac0 Compare October 31, 2023 09:43
@pippolo84
Copy link
Member Author

Force-pushed to apply the changes suggested for v1.14 here.

@pippolo84
Copy link
Member Author

pippolo84 commented Oct 31, 2023

/test-backport-1.13

Job 'Cilium-PR-K8s-1.20-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications

Failure Output

FAIL: Error creating resource /home/jenkins/workspace/Cilium-PR-K8s-1.20-kernel-4.19/src/github.com/cilium/cilium/test/k8s/manifests/l3-policy-demo.yaml: Cannot retrieve cilium pod cilium-d6qzp policy revision: cannot get the revision 

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.20-kernel-4.19/210/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.20-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

Job 'Cilium-PR-K8s-1.19-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathConfig MonitorAggregation Checks that monitor aggregation restricts notifications

Failure Output

FAIL: Error creating resource /home/jenkins/workspace/Cilium-PR-K8s-1.19-kernel-4.19/src/github.com/cilium/cilium/test/k8s/manifests/l3-policy-demo.yaml: Cannot retrieve cilium pod cilium-ssl8d policy revision: cannot get the revision 

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.19-kernel-4.19/213/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-4.19 so I can create one.

Then please upload the Jenkins artifacts to that issue.

@pippolo84 pippolo84 marked this pull request as ready for review October 31, 2023 09:49
@pippolo84 pippolo84 requested review from a team as code owners October 31, 2023 09:49
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PR looks good. Thanks Fabio!

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically we document the conflict resolution if the context is important to save in the commit msgs by having a section in the commit body like [ Backporter's notes: resolved blah blah ].

@pippolo84
Copy link
Member Author

/test-1.18-4.19

@pippolo84
Copy link
Member Author

/test-1.19-4.19

@pippolo84
Copy link
Member Author

/test-1.20-4.19

@aditighag aditighag merged commit f9d6823 into cilium:v1.13 Nov 2, 2023
@aditighag
Copy link
Member

@pippolo84 Looks like the label updater job failed, so you would have to update the labels manually - https://github.com/cilium/cilium/actions/runs/6734638321/job/18305980125. I checked one of the PRs, and the backport labels are indeed not updated.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 2, 2023
@pippolo84
Copy link
Member Author

Looks like the label updater job failed, so you would have to update the labels manually

Updated the labels manually, I'm investigating the workflow failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

10 participants