Skip to content

v1.16 Backports 2025-06-17 #40093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 19, 2025
Merged

v1.16 Backports 2025-06-17 #40093

merged 5 commits into from
Jun 19, 2025

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Jun 17, 2025

aanm and others added 4 commits June 17, 2025 13:24
[ upstream commit 2b24f33 ]

If the ciliumbot has approved a PR from renovate, we can safely remove
the other reviewers from the list. This will decrease the noise towards
those reviewers.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 6d2000c ]

Whenever a pool is deleted, the operator deletes all LB ranges of that
pool and then strip off each address allocation related to the removed
ranges. While doing so, it attempts to immediately reconcile each
service that lost an address, trying to allocating another address from
an alive range. The reason to do so instead of running a full
reconciliation is to improve efficiency: since each k8s object
describing a service that lost an allocation should be patched, a later
change to reallocate another address would lead to an additional update
immediately after.

However, trying to immediately reconcile a service while deleting the
ranges might lead to issues in the allocation. In the case of a
multi-range IP pool, if we try to reconcile a service while deleting the
first range, we might end up allocating an IP from a range that belongs
to the same pool, leading to a further removal and reconciliation in the
next iteration. Moreover, the internal LBIPAM manager status is not
consistent during the loop: while the pool has already been deleted, the
ranges from that pool are being deleting one at a time. Therefore, a
reconciliation attempt in the middle of the ranges deletion, might
allocate an IP from a range with a non-existent parent pool, leading to
spuriuos warnings like the following:

`Bad state detected, store contains lbRange for pool 'pool-a' but
missing the pool`

To solve this the commit breaks the current deleteRangeAllocations in
two parts.  In the first one, we loop over the ranges to remove each of
them alongside any allocation for an address contained in the range.
Each modified service is just collected and returned.  Then, after
deleting the removed pool from the LBIPAM internal status, we finally
reconcile each modified service, trying to reallocate an address
immediately to avoid the double update.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit f4b9b76 ]

When handling a pool deletion event, the LBIPAM manager loops over the
removed pool ranges to strip off each address allocation with an IP
contained in one of the ranges. While doing that, it also updates the
poolToRanges internal map removing the range it is iterating on from the
slice keyed by the deleted pool name.

Removing items from a slice while iterating on them with a for range
loop is incorrect, since the length of the slice is evaluated just once
at the loop start. In case of a pool with multiple ranges, this may lead
to the following panic:

```
$ go test -trimpath -v ./operator/pkg/lbipam/... -run="TestPoolDelete"
=== RUN   TestPoolDelete
--- FAIL: TestPoolDelete (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x2db94b8]

goroutine 98 [running]:
testing.tRunner.func1.2({0x3203820, 0x61279f0})
	testing/testing.go:1734 +0x21c
testing.tRunner.func1()
	testing/testing.go:1737 +0x35e
panic({0x3203820?, 0x61279f0?})
	runtime/panic.go:792 +0x132
github.com/cilium/cilium/operator/pkg/lbipam.(*rangesStore).Delete(0xc0003b0bf8, 0x0)
	github.com/cilium/cilium/operator/pkg/lbipam/range_store.go:34 +0x1d8
github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).handlePoolDeleted(0xc0003b0b60, {0x3d66d08, 0xc000653540}, 0xc0006e6b00)
	github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:1763 +0x1df
github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).poolOnDelete(0xc0003b0b60, {0x3d66d08, 0xc000653540}, 0xc0006e6b00)
	github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:345 +0xe7
github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).handlePoolEvent(0xc0003b0b60, {0x3d66d08?, 0xc000653540?}, {{0x38bfe05, 0x6}, {{0xc00051e6d8, 0x6}, {0x0, 0x0}}, 0xc0006e6b00, ...})
	github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:262 +0x87
github.com/cilium/cilium/operator/pkg/lbipam.(*newFixture).DeletePool(0xc00008bef0, 0xc0006836c0, 0xc0006e6b00)
	github.com/cilium/cilium/operator/pkg/lbipam/lbipam_fixture_test.go:192 +0x1a5
github.com/cilium/cilium/operator/pkg/lbipam.TestPoolDelete(0xc0006836c0)
	github.com/cilium/cilium/operator/pkg/lbipam/lbipam_test.go:1814 +0x385
testing.tRunner(0xc0006836c0, 0x3a35298)
	testing/testing.go:1792 +0xf4
created by testing.(*T).Run in goroutine 1
	testing/testing.go:1851 +0x413
FAIL	github.com/cilium/cilium/operator/pkg/lbipam	0.031s
FAIL
```

The commit extends the TestDisablePool unit test to have multiple ranges
in the fixture pools and then fixes the issue shallow copying the ranges
slice before starting the iteration.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit d56ae81 ]

This step can be parallelized so that this step runs faster to remove
these directories.

Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki added kind/backports This PR provides functionality previously merged into master. backport/1.16 This PR represents a backport for Cilium 1.16.x of a PR that was merged to main. labels Jun 17, 2025
@joamaki joamaki requested review from aanm and pippolo84 June 17, 2025 11:27
@joamaki joamaki added the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jun 17, 2025
@joamaki
Copy link
Contributor Author

joamaki commented Jun 17, 2025

/test

@joamaki joamaki marked this pull request as ready for review June 17, 2025 11:44
@joamaki joamaki requested review from a team as code owners June 17, 2025 11:44
@joamaki joamaki requested a review from christarazi June 17, 2025 11:44
[ upstream commit 1a2faea ]

If the command doesn't have the repository as part of the command, then
gh will try to derive it from the .git directory. Since this directory
doesn't exist, the command fails.

Fixes: 2b24f33 (".github/workflows: remove reviewers if ciliumbot approved PR")
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/v1.16-backport-2025-06-17-01-24 branch from 477f925 to cec6b34 Compare June 18, 2025 08:53
Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@joamaki
Copy link
Contributor Author

joamaki commented Jun 18, 2025

/test

@joamaki joamaki removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jun 19, 2025
@joamaki joamaki added this pull request to the merge queue Jun 19, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 19, 2025
@joamaki joamaki added this pull request to the merge queue Jun 19, 2025
Merged via the queue into v1.16 with commit 579314a Jun 19, 2025
335 of 342 checks passed
@joamaki joamaki deleted the pr/v1.16-backport-2025-06-17-01-24 branch June 19, 2025 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.16 This PR represents a backport for Cilium 1.16.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants