Skip to content

Conversation

rastislavs
Copy link
Contributor

@rastislavs rastislavs commented Jul 31, 2025

@rastislavs rastislavs added kind/backports This PR provides functionality previously merged into master. backport/1.18 This PR represents a backport for Cilium 1.18.x of a PR that was merged to main. labels Jul 31, 2025
@github-actions github-actions bot added the sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. label Jul 31, 2025
@rastislavs rastislavs marked this pull request as ready for review July 31, 2025 13:45
@rastislavs rastislavs requested review from a team as code owners July 31, 2025 13:45
@rastislavs
Copy link
Contributor Author

/test

Copy link
Member

@mhofstetter mhofstetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My commit looks good, thanks!

@joestringer joestringer mentioned this pull request Aug 1, 2025
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.github/ and tools/ LGTM.

@aanm aanm added this pull request to the merge queue Aug 4, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Aug 4, 2025
marseel and others added 13 commits August 4, 2025 11:39
[ upstream commit 3356222 ]

In the past, we were creating and updating CRDs in Agent and because of
that we were doing that in parallel to speed up bootstrap. Related to #12719

However, since we started creating and updating CRDs in operator, this
is no longer necessary and can actually cause problems in large
clusters: #39267

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
[ upstream commit 8e53cf5 ]

[ backporter's notes: skipped changes in .github/workflows/net-perf-gke.yaml
  as the file is not present in the target branch ]

Signed-off-by: Ashwin Pillai <pillaiashwin96@gmail.com>
[ upstream commit dda3976 ]

Signed-off-by: Ashwin Pillai <pillaiashwin96@gmail.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
[ upstream commit 32034ef ]

Currently, missing test owners on a particula package lead to an error
of the form

    ERROR Failed to locate owner for package path=github.com/cilium/cilium/pkg/foo error="no owners defined"

potentially being reported multiple times[^1]. De-duplicate the reported
error log lines to de-clutter the logs in that case.

[^1]: https://github.com/cilium/cilium/actions/runs/16586570193/job/46912960231#step:15:779

Signed-off-by: Tobias Klauser <tobias@cilium.io>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
[ upstream commit 32f7fa6 ]

The new Cilium LB controlplane introduced a new property `Unhealthy` on the
backend params which allows for healthchecker extensions to report a backend
as unhealthy. The LB backend selection respects the `State` & `Unhealthy`
properties of the backend.

While introducing the new property, there was an oversight of `cilium-dbg service list`
which still shows the backend state as `active` even though the backend is reported as
`unhealthy`.

Therefore, this commit changes the LB REST API implementation to report the state
of a backend as `quarantined` if `Unhealthy==true`.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
[ upstream commit 8d1207d ]

It is possible to use geneve for dsrDispatch (on Native and Encupaslation mode)

Signed-off-by: Alexis La Goutte <alexis.lagoutte@gmail.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
[ upstream commit 2f90a8f ]

Currently, is a policy is applied before Cililum is installed (ex: through directory), agent fails to sync the policy for endpoint. This change fixes the issue.
Fixes: GH-37724

Signed-off-by: Anubhab Majumdar <anmajumdar@microsoft.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
[ upstream commit 676d346 ]

[ backporter's notes: resolved conflicts in pkg/loadbalancer/reconciler/termination.go:
  - in socketTerminationLoop(), only UDP protocol backend is skipped in the target branch
  version, while also TCP is skipped in the main version
  - in terminateUDPConnectionsToBackend(), UDP protocol is assigned to protocol
  in case of lb.UDP or lb.ANY in the target branch version, only in case of lb.UDP in main ]

The L3n4Addr is used in a lot of places and is compared and hashed a lot.
Reduce memory usage and speed up comparisons by making L3n4Addr a unique.Handle.

pkg/loadbalancer/benchmark before:
Min: Allocated 563043kB in total, 1837855 objects / 118158kB still reachable (per service:  36 objs, 11531B alloc,  2419B in-use)
Avg: Allocated 583710kB in total, 2174642 objects / 154810kB still reachable (per service:  43 objs, 11954B alloc,  3170B in-use)
Max: Allocated 643000kB in total, 3089878 objects / 263505kB still reachable (per service:  61 objs, 13168B alloc,  5396B in-use)

After:
Min: Allocated 510398kB in total, 1734286 objects /  83716kB still reachable (per service:  34 objs, 10452B alloc,  1714B in-use)
Avg: Allocated 523617kB in total, 2188965 objects / 125627kB still reachable (per service:  43 objs, 10723B alloc,  2572B in-use)
Max: Allocated 564751kB in total, 3478097 objects / 229608kB still reachable (per service:  69 objs, 11566B alloc,  4702B in-use)

Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 0cd21f9 ]

As we need the /mnt directory to be empty might as well delete all files
that are there instead of assuming that only a couple of files take the
most amount of space.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
[ upstream commit fe38bdf ]

Use find to ignore swapfile and avoid error in case of permission
denied.

Fixes: 0cd21f9 (".github: remove all contents of /mnt in build images CI")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
[ upstream commit 1373e01 ]

Ensure that the identity allocator is synchronized before starting the
actual tests, to prevent flakes caused by the goroutine started by
[(*CachingIdentityAllocator).InitIdentityAllocator] still lingering
around when the Hive gets stopped. The proper fix would be converting
the identity allocator to a cell, but that's a way more significant
amount of work, so let's go for the easy fix for the moment.

Example failure that could otherwise occur:

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃   PANIC  package: github.com/cilium/cilium/daemon/cmd • TestEndpointAddReservedLabelEtcd   ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1d0 pc=0x1811979]

goroutine 589 [running]:
github.com/cilium/cilium/pkg/node.(*LocalNodeStore).Get(_, {_, _})
	/home/runner/work/cilium/cilium/pkg/node/local_node_store.go:195 +0x79
github.com/cilium/cilium/pkg/node.getLocalNode(_)
	/home/runner/work/cilium/cilium/pkg/node/address.go:46 +0x8d
github.com/cilium/cilium/pkg/node.GetIPv4(0x2?)
	/home/runner/work/cilium/cilium/pkg/node/address.go:232 +0x25
github.com/cilium/cilium/pkg/identity/cache/cell.(*identityAllocatorOwner).GetNodeSuffix(0xc002b90cc0)
	/home/runner/work/cilium/cilium/pkg/identity/cache/cell/cell.go:162 +0x89
github.com/cilium/cilium/pkg/identity/cache.(*CachingIdentityAllocator).InitIdentityAllocator.func1({0x5b360f8, 0xc002b90cc0}, 0xc00048c5b0, 0x100, 0xffff)
	/home/runner/work/cilium/cilium/pkg/identity/cache/allocator.go:253 +0x1e8
created by github.com/cilium/cilium/pkg/identity/cache.(*CachingIdentityAllocator).InitIdentityAllocator in goroutine 232
	/home/runner/work/cilium/cilium/pkg/identity/cache/allocator.go:237 +0x42b
FAIL	github.com/cilium/cilium/daemon/cmd	1.294s

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
[ upstream commit 80c5b87 ]

In the event that a regeneration failed, we re-populated the MapState
but forgot to set the rule origin. Unfortunately, the zero value for
Handle is not equivalent to nil; we must always set it explicitly. Use
the constructor instead.

Fixes: fe165aa

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
[ upstream commit a692a1b ]

If this check is not present, then Cilium consider global identity to be
node-local if there is a label called "ingress" in the label set.

For example, a workload pod could have the label set:

```
k8s:foo=bar
k8s:ingress=allowed
```

and Cilium will incorrectly assign it a node-local identity because the
"ingress" label is present, without checking the source. Hence why we
need to add a check for the reserved source.

A unit test is added to validate this behavior.

Fixes: 226a978 ("identity: Allow local identity for ingress label")
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Rastislav Szabo <rastislav.szabo@isovalent.com>
@rastislavs rastislavs force-pushed the pr/v1.18-backport-2025-07-31-02-45 branch from 17ba8bd to 71957ab Compare August 4, 2025 09:40
@rastislavs
Copy link
Contributor Author

rastislavs commented Aug 4, 2025

(rebased to pull the TESOWNERS file change)

@rastislavs
Copy link
Contributor Author

rastislavs commented Aug 4, 2025

Looks like BPF unit/integration Tests started failing with the following error (even though it passed on this RR previously?).

invalid argument(s): /go/src/github.com/cilium/cilium/TESTOWNERS -code-owners-prefix github.com/cilium/cilium/ -out ../../test/bpf_tests.xml
go-junit-report does not accept positional arguments

@joestringer may it related to your change?

Tried to rebase the PR to pull the recently added TESTOWNERS file, but did not help.

@joestringer
Copy link
Member

@rastislavs let's just drop #40776 to unblock. I can follow up separately.

@joestringer
Copy link
Member

Actually, I found a simple fix so I'll just push that fix on top (and I'll submit that separately for main).

@joestringer
Copy link
Member

/test

@joestringer joestringer enabled auto-merge August 4, 2025 17:17
[ upstream commit f283301 ]

Previously when multiple code owners were specified, the extra files
would be passed as separate arguments, but the subsequent commands
expect them to be specified in the form "file1,file2". Emit them as
comma-separated values instead of space-separated.

Fixes: dda3976 ("Fix code owner attribution for test failures on stable branches")
Signed-off-by: Joe Stringer <joe@cilium.io>
@joestringer joestringer force-pushed the pr/v1.18-backport-2025-07-31-02-45 branch from 5b8a9ab to 644c1ab Compare August 4, 2025 19:25
@joestringer
Copy link
Member

/test

@joestringer
Copy link
Member

ci-e2e-upgrade run hit #38643, which is only currently fixed on main and not yet backported to v1.18 so it's somewhat expected. I'll retrigger the test run.

@joestringer joestringer added this pull request to the merge queue Aug 4, 2025
Merged via the queue into v1.18 with commit 12bce59 Aug 4, 2025
359 of 360 checks passed
@joestringer joestringer deleted the pr/v1.18-backport-2025-07-31-02-45 branch August 4, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.18 This PR represents a backport for Cilium 1.18.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.