Skip to content

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Aug 12, 2025

  • Make the lb/prune command wait for the pruning to happen to avoid potential races with the test/bpfops-reset
  • Add a mutex to BPFOps so test/bpfops-reset and test/bpfops-summary don't race with operations
  • Update the goleak ignores after the client-go library was bumped and function names changed
  • Fix a flake in redirectpolicy's service.txtar causing ID re-use when backend deletion wasn't waited for
  • Revert the change to metrics.Cell and move the NewLegacyMetrics back into metrics.AgentCell. This fixes test failure when run with -race.

@joamaki joamaki requested a review from a team as a code owner August 12, 2025 12:07
@joamaki joamaki requested a review from aditighag August 12, 2025 12:07
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 12, 2025
@joamaki joamaki requested a review from brb August 12, 2025 12:08
@joamaki joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Aug 12, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 12, 2025
@joamaki joamaki added the needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch label Aug 12, 2025
@joamaki
Copy link
Contributor Author

joamaki commented Aug 12, 2025

Marked for v1.18 backport to proactively fix potential flakes there. The only production code changes was the added BPFOps.mu mutex which is a safe change.

@joamaki
Copy link
Contributor Author

joamaki commented Aug 12, 2025

/test

Wait for the prune to actually happen in the lb/prune command
to make tests that e.g. do BPF state restoration more reliable
as then we won't have a prune racing in the background.

Update migrate-any-proto.txtar to call lb/prune before restoration
to avoid a race.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
While the StateDB reconciler never calls the Update/Delete/Prune
concurrently, we do want to be able to do BPFOps.ResetAndRestore
from a test script to clear out the state.

Since [sync.Mutex.Lock] is very cheap on an unlocked mutex, add
a mutex around the BPFOps state so that we can inspect and manipulate
it safely from tests and avoid very odd failures.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
This had changed when client-go was updated and this was causing
false positive goroutine leak failures.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
The backends table wasn't checked after service and endpoint slice removal
leading to sometimes adding the endpoints back before the deletions were
processed leading to re-use of old IDs.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/lb-test-flake-fixes branch from e7b700e to e51ef66 Compare August 12, 2025 12:29
This should have never moved into 'Cell' as the whole point was to keep the legacy
metrics and global variables out of 'Cell' so tests can use it.

Fixes: 0b3672f ("pkg/metrics: prepare *metrics.Registry for use by operator.")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki requested a review from a team as a code owner August 12, 2025 12:43
@joamaki joamaki requested a review from derailed August 12, 2025 12:43
@joamaki
Copy link
Contributor Author

joamaki commented Aug 12, 2025

/test

@joamaki joamaki requested a review from tommyp1ckles August 12, 2025 13:24
@joamaki
Copy link
Contributor Author

joamaki commented Aug 12, 2025

/test

@joamaki
Copy link
Contributor Author

joamaki commented Aug 12, 2025

@tommyp1ckles could you please review the metrics-related commit as that reverts the move of NewLegacyMetrics from AgentCell to Cell in 0b3672f.

@joamaki joamaki enabled auto-merge August 13, 2025 07:01
@joamaki joamaki added this pull request to the merge queue Aug 13, 2025
Merged via the queue into cilium:main with commit 27fec19 Aug 13, 2025
74 checks passed
@joamaki joamaki deleted the pr/joamaki/lb-test-flake-fixes branch August 13, 2025 07:11
@joamaki joamaki added backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. and removed needs-backport/1.18 This PR / issue needs backporting to the v1.18 branch labels Aug 19, 2025
@github-actions github-actions bot added backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. and removed backport-pending/1.18 The backport for Cilium 1.18.x for this PR is in progress. labels Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-done/1.18 The backport for Cilium 1.18.x for this PR is done. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants