Skip to content

LB-IPAM: Fix performance fixes #39278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dylandreimerink
Copy link
Member

This PR addresses two issues that were surfaced by scale testing as part of #39276.

Please see the individual commit messages for details.

LB-IPAM: reduce CPU usage during service creation and release memory when no longer needed

During scale testing where only services were being added, the CPU
profile showed that a large amount of time was spent revalidating
already allocated IPs in pools. This was unexpected.

It turns out that when we patch the pool to update the status with new
counts, we also observe our own change. Our pool upsert logic for
revalidation is called, assuming the spec must have changed. But most
of the time its just the status that changed.

This commit adds a deepequal check on the pool spec so we only do work
when the spec actually changes.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
The `HashAllocator` uses a go map to store metadata for allocated IPs.
The go runtime automatically grows the map when it becomes full.
However, it does not automatically shrink the map when it becomes empty.

So when a large number of IPs were allocated at some point and then
deallocated, the map would hold onto more memory than necessary.

This commit adds logic to the `HashAllocator` to re-create the map when
its underutilized. Go does not allow us to query its "capacity" so we
track its peak size instead. When the current number of allocated IPs
is less than 1/3 of the peak size, we re-create the map.
The new map gets a size indication of 1/2 the peak size. This leaves
some room for growth before the map needs to be grown again.

During testing we allocated 50k IPs, then deallocated all. This change
reduced the resident memory usage by 50MB compared to not re-creating
the map.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 1, 2025
@dylandreimerink dylandreimerink added the release-note/misc This PR makes changes that have no direct user impact. label May 1, 2025
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 1, 2025
@dylandreimerink dylandreimerink added kind/performance There is a performance impact of this. area/operator Impacts the cilium-operator component needs-backport/1.15 needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch labels May 1, 2025
@dylandreimerink
Copy link
Member Author

/test

@dylandreimerink dylandreimerink removed needs-backport/1.15 needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch needs-backport/1.17 This PR / issue needs backporting to the v1.17 branch labels May 1, 2025
@dylandreimerink dylandreimerink marked this pull request as ready for review May 2, 2025 09:47
@dylandreimerink dylandreimerink requested review from a team as code owners May 2, 2025 09:47
@dylandreimerink
Copy link
Member Author

@cilium/sig-lb could I get a review on this?

@dylandreimerink dylandreimerink added this pull request to the merge queue May 13, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 13, 2025
@dylandreimerink dylandreimerink added this pull request to the merge queue May 14, 2025
Merged via the queue into cilium:main with commit 6dce1d4 May 14, 2025
76 checks passed
@dylandreimerink dylandreimerink deleted the feature/lb-ipam-perf-fixes branch May 14, 2025 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operator Impacts the cilium-operator component kind/performance There is a performance impact of this. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants