-
Notifications
You must be signed in to change notification settings - Fork 3.4k
v1.17 Backports 2025-06-17 #40094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
v1.17 Backports 2025-06-17 #40094
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit 20d040d ] Add a Delete method that can be called with the map lock acquired. This allows synchronizing access to the map attributes in DumpReliablyWithCallback where the callback invokes map key delete operation. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 2973257 ] PurgeOrphanNATEntries previously iterated the nat map via DumpReliablyWithCallback, and provided a callback function that also deleted entries in the nat map. The previous commit synchronized access to the Map construct by acquiring the Map lock, and would lead to a deadlock in PurgeOrphanNATEntries. Hence, move the orphan entries deletion logic outside the nat map iteration. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
/test |
[ upstream commit bfba0eb ] The method previously accessed the map construct directly, which could result in race conditions with other operations on the map, such as in the `Close` method. One such data race was reported in the ctmap, where the map was closed by a periodically running controller while it was simultaneously accessed in `DumpReliablyWithCallback` - Currently, callers can invoke map operations in the callback that acquire the map (write) lock. To allow such callbacks without causing a deadlock, acquire the (write) lock instead of RLock. (snippet) ``` 2025-01-30T21:27:22.005325276Z Read at 0x00c0099535a8 by goroutine 6096710: 2025-01-30T21:27:22.005373476Z github.com/cilium/ebpf/internal/sys.(*FD).Uint() 2025-01-30T21:27:22.005379677Z /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/sys/fd.go:71 +0x144 2025-01-30T21:27:22.005446912Z github.com/cilium/ebpf.(*Map).nextKey() 2025-01-30T21:27:22.005454006Z /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/map.go:993 +0x112 2025-01-30T21:27:22.005553101Z github.com/cilium/ebpf.(*Map).NextKey() 2025-01-30T21:27:22.005560745Z /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/map.go:950 +0xae 2025-01-30T21:27:22.005640643Z github.com/cilium/cilium/pkg/bpf.(*Map).NextKey() 2025-01-30T21:27:22.005647146Z /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:606 +0x11e 2025-01-30T21:27:22.005712027Z github.com/cilium/cilium/pkg/bpf.(*Map).DumpReliablyWithCallback() 2025-01-30T21:27:22.005718629Z /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:768 +0x71b 2025-01-30T21:27:22.005815760Z github.com/cilium/cilium/pkg/bpf.(*Map).GetModel() 2025-01-30T21:27:22.005822322Z /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:1413 +0x511 2025-01-30T21:27:22.005880481Z github.com/cilium/cilium/pkg/bpf.GetOpenMaps() 2025-01-30T21:27:22.005886152Z /go/src/github.com/cilium/cilium/pkg/bpf/map_register_linux.go:64 +0x2ad 2025-01-30T21:27:22.006024930Z github.com/cilium/cilium/pkg/maps.(*getMapHandler).Handle() 2025-01-30T21:27:22.006034067Z /go/src/github.com/cilium/cilium/pkg/maps/api.go:146 +0x24 ``` ``` 2025-01-30T21:27:22.006410985Z Previous write at 0x00c0099535a8 by goroutine 4182: 2025-01-30T21:27:22.006418068Z github.com/cilium/ebpf/internal/sys.(*FD).Disown() 2025-01-30T21:27:22.006423969Z /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/sys/fd.go:93 +0x8e 2025-01-30T21:27:22.006429549Z github.com/cilium/ebpf/internal/sys.(*FD).Close() 2025-01-30T21:27:22.006435180Z /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/sys/fd.go:84 +0x47 2025-01-30T21:27:22.006870411Z github.com/cilium/ebpf.(*Map).Close() 2025-01-30T21:27:22.006879989Z /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/map.go:1371 +0x1e6 2025-01-30T21:27:22.006884858Z github.com/cilium/cilium/pkg/bpf.(*Map).Close() 2025-01-30T21:27:22.006890338Z /go/src/github.com/cilium/cilium/pkg/bpf/map_linux.go:591 +0x1ac 2025-01-30T21:27:22.006895778Z github.com/cilium/cilium/pkg/maps/ctmap.CalculateCTMapPressure.func1.deferwrap1() 2025-01-30T21:27:22.006901959Z /go/src/github.com/cilium/cilium/pkg/maps/ctmap/ctmap.go:858 +0x33 2025-01-30T21:27:22.006907119Z runtime.deferreturn() 2025-01-30T21:27:22.006957073Z /usr/local/go/src/runtime/panic.go:605 +0x5d 2025-01-30T21:27:22.006962683Z github.com/cilium/cilium/pkg/controller.(*controller).runController() 2025-01-30T21:27:22.007064754Z /go/src/github.com/cilium/cilium/pkg/controller/controller.go:251 +0xa2 2025-01-30T21:27:22.007072437Z github.com/cilium/cilium/pkg/controller.(*Manager).createControllerLocked.gowrap1() 2025-01-30T21:27:22.007077677Z /go/src/github.com/cilium/cilium/pkg/controller/manager.go:111 +0xbc ``` Consequently, call DeleteLocked in callbacks passed to DumpReliablyWithCallback. Signed-off-by: Aditi Ghag <aditi@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 2b24f33 ] If the ciliumbot has approved a PR from renovate, we can safely remove the other reviewers from the list. This will decrease the noise towards those reviewers. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit 6d2000c ] Whenever a pool is deleted, the operator deletes all LB ranges of that pool and then strip off each address allocation related to the removed ranges. While doing so, it attempts to immediately reconcile each service that lost an address, trying to allocating another address from an alive range. The reason to do so instead of running a full reconciliation is to improve efficiency: since each k8s object describing a service that lost an allocation should be patched, a later change to reallocate another address would lead to an additional update immediately after. However, trying to immediately reconcile a service while deleting the ranges might lead to issues in the allocation. In the case of a multi-range IP pool, if we try to reconcile a service while deleting the first range, we might end up allocating an IP from a range that belongs to the same pool, leading to a further removal and reconciliation in the next iteration. Moreover, the internal LBIPAM manager status is not consistent during the loop: while the pool has already been deleted, the ranges from that pool are being deleting one at a time. Therefore, a reconciliation attempt in the middle of the ranges deletion, might allocate an IP from a range with a non-existent parent pool, leading to spuriuos warnings like the following: `Bad state detected, store contains lbRange for pool 'pool-a' but missing the pool` To solve this the commit breaks the current deleteRangeAllocations in two parts. In the first one, we loop over the ranges to remove each of them alongside any allocation for an address contained in the range. Each modified service is just collected and returned. Then, after deleting the removed pool from the LBIPAM internal status, we finally reconcile each modified service, trying to reallocate an address immediately to avoid the double update. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit f4b9b76 ] When handling a pool deletion event, the LBIPAM manager loops over the removed pool ranges to strip off each address allocation with an IP contained in one of the ranges. While doing that, it also updates the poolToRanges internal map removing the range it is iterating on from the slice keyed by the deleted pool name. Removing items from a slice while iterating on them with a for range loop is incorrect, since the length of the slice is evaluated just once at the loop start. In case of a pool with multiple ranges, this may lead to the following panic: ``` $ go test -trimpath -v ./operator/pkg/lbipam/... -run="TestPoolDelete" === RUN TestPoolDelete --- FAIL: TestPoolDelete (0.00s) panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x2db94b8] goroutine 98 [running]: testing.tRunner.func1.2({0x3203820, 0x61279f0}) testing/testing.go:1734 +0x21c testing.tRunner.func1() testing/testing.go:1737 +0x35e panic({0x3203820?, 0x61279f0?}) runtime/panic.go:792 +0x132 github.com/cilium/cilium/operator/pkg/lbipam.(*rangesStore).Delete(0xc0003b0bf8, 0x0) github.com/cilium/cilium/operator/pkg/lbipam/range_store.go:34 +0x1d8 github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).handlePoolDeleted(0xc0003b0b60, {0x3d66d08, 0xc000653540}, 0xc0006e6b00) github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:1763 +0x1df github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).poolOnDelete(0xc0003b0b60, {0x3d66d08, 0xc000653540}, 0xc0006e6b00) github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:345 +0xe7 github.com/cilium/cilium/operator/pkg/lbipam.(*LBIPAM).handlePoolEvent(0xc0003b0b60, {0x3d66d08?, 0xc000653540?}, {{0x38bfe05, 0x6}, {{0xc00051e6d8, 0x6}, {0x0, 0x0}}, 0xc0006e6b00, ...}) github.com/cilium/cilium/operator/pkg/lbipam/lbipam.go:262 +0x87 github.com/cilium/cilium/operator/pkg/lbipam.(*newFixture).DeletePool(0xc00008bef0, 0xc0006836c0, 0xc0006e6b00) github.com/cilium/cilium/operator/pkg/lbipam/lbipam_fixture_test.go:192 +0x1a5 github.com/cilium/cilium/operator/pkg/lbipam.TestPoolDelete(0xc0006836c0) github.com/cilium/cilium/operator/pkg/lbipam/lbipam_test.go:1814 +0x385 testing.tRunner(0xc0006836c0, 0x3a35298) testing/testing.go:1792 +0xf4 created by testing.(*T).Run in goroutine 1 testing/testing.go:1851 +0x413 FAIL github.com/cilium/cilium/operator/pkg/lbipam 0.031s FAIL ``` The commit extends the TestDisablePool unit test to have multiple ranges in the fixture pools and then fixes the issue shallow copying the ranges slice before starting the iteration. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
[ upstream commit d56ae81 ] This step can be parallelized so that this step runs faster to remove these directories. Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
6a1cb9f
to
0a41f67
Compare
/test |
aanm
approved these changes
Jun 17, 2025
3 tasks
[ upstream commit 1a2faea ] If the command doesn't have the repository as part of the command, then gh will try to derive it from the .git directory. Since this directory doesn't exist, the command fails. Fixes: 2b24f33 (".github/workflows: remove reviewers if ciliumbot approved PR") Signed-off-by: André Martins <andre@cilium.io> Signed-off-by: Jussi Maki <jussi@isovalent.com>
0a41f67
to
61ccb16
Compare
pippolo84
approved these changes
Jun 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/test |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.17
This PR represents a backport for Cilium 1.17.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PRs skipped due to conflicts:
Once this PR is merged, a GitHub action will update the labels of these PRs: