Skip to content

Conversation

dylandreimerink
Copy link
Member

This PR adds a scale test of LB-IPAM to prove that the feature scales well event when a lot of services are created. The test is fairly simple, we create a pool and then over the course of 3 minutes (by default) add 50k services which get an IP from the pool. We record the event processing time, CPU usage and memory usage.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 1, 2025
@dylandreimerink dylandreimerink added the release-note/ci This PR makes changes to the CI. label May 1, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 1, 2025
@dylandreimerink dylandreimerink force-pushed the pr/dylandreimerink/lb-ipam-scale-test branch from e0249e1 to fcb9e5b Compare May 1, 2025 10:08
@dylandreimerink
Copy link
Member Author

/test

@dylandreimerink dylandreimerink force-pushed the pr/dylandreimerink/lb-ipam-scale-test branch from fcb9e5b to 25135e2 Compare May 2, 2025 08:35
@dylandreimerink
Copy link
Member Author

/scale-100

@dylandreimerink dylandreimerink force-pushed the pr/dylandreimerink/lb-ipam-scale-test branch from 25135e2 to 925ed0f Compare May 2, 2025 10:56
In preparation of scale testing, we added metrics to track the time
it takes to process events in the lbipam controller. This will help
give us insight into how LB-IPAM performance scales with the number
of services in the cluster.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit introduces a basic scale test module for lb-ipam. The module
will create a LB-IPAM pool, and create 50k services in 3 minutes. It
measures the 50, 90 and 99 percentiles of the time it takes to process
a service, as well as the max CPU and memory usage of the operator
during this time.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink dylandreimerink force-pushed the pr/dylandreimerink/lb-ipam-scale-test branch from 925ed0f to ff937d1 Compare May 9, 2025 14:38
@dylandreimerink
Copy link
Member Author

/scale-5

@dylandreimerink dylandreimerink force-pushed the pr/dylandreimerink/lb-ipam-scale-test branch from ff937d1 to 4c3b1a5 Compare May 9, 2025 14:55
@dylandreimerink dylandreimerink force-pushed the pr/dylandreimerink/lb-ipam-scale-test branch 2 times, most recently from d3df317 to b786d6d Compare May 22, 2025 10:40
@dylandreimerink
Copy link
Member Author

Alright. Got the new workflow to run, seems successful: https://github.com/cilium/cilium/actions/runs/15184456179/job/42701510501

Will disable test run on push now, then mark as ready for review.

@dylandreimerink dylandreimerink force-pushed the pr/dylandreimerink/lb-ipam-scale-test branch from b786d6d to 85ca550 Compare May 22, 2025 11:48
@dylandreimerink dylandreimerink marked this pull request as ready for review May 22, 2025 11:48
@dylandreimerink dylandreimerink requested review from a team as code owners May 22, 2025 11:48
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could share more of the core content here by defining some reusable workflow for both scale test workflows, and parameterize the node count and testconfig parameters.

The full diff is very small compared to 100 node test. I glanced through the workflow and it seems sane overall. A couple of minor comments below.

@julianwiedmann julianwiedmann added area/metrics Impacts statistics / metrics gathering, eg via Prometheus. feature/lb-ipam sig/scalability Impacts how well Cilium handles a high rate of events or churn. labels May 26, 2025
The LB-IPAM scale test is relatively small and only puts load on the
operator. So adding a small cluster test with only 5 nodes which will
be a lot cheaper then adding to any of the other scale test workflows.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink dylandreimerink force-pushed the pr/dylandreimerink/lb-ipam-scale-test branch from 85ca550 to 3e3d1d7 Compare June 4, 2025 10:57
@dylandreimerink
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 4, 2025
@dylandreimerink dylandreimerink added this pull request to the merge queue Jun 4, 2025
Merged via the queue into main with commit 79e7ce6 Jun 4, 2025
298 of 299 checks passed
@dylandreimerink dylandreimerink deleted the pr/dylandreimerink/lb-ipam-scale-test branch June 4, 2025 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Impacts statistics / metrics gathering, eg via Prometheus. feature/lb-ipam ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI. sig/scalability Impacts how well Cilium handles a high rate of events or churn.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants