Skip to content

Conversation

christarazi
Copy link
Member

@christarazi christarazi commented Sep 10, 2021

With this commit, the identity GC rate limit
(--identity-gc-rate-interval) becomes the effective rate at which
identities are garbage collected. Previously, the identity GC interval
(--identity-gc-interval) would cause the Operator to GC for that much
time, then the sleep for that much time, rinse and repeat, effectively
halving the rate.

To use concrete numbers for an example, let's say our interval is 5m and
our GC rate interval is 1000 per minute. It would mean that previously,
we would GC 5000 identities at a maximum for 10m (assuming that deletion
takes 0s). How was that calculated? Each minute, we GC 1000 identities.
After 5m, we have GC'd 5000 identities. But now we have to sleep for 5m
because that's our GC interval. Hence making our effective GC rate 500
per minute (instead of being 1000/m).

Now, we compute the time taken to perform the actual GC and subtract
that from the interval. So in our above example, we eliminate the dead
time of 5m and avoid slashing our effective GC rate in half. This change
allows the Operator to keep up with the demand more efficiently. The
Operator will warn if the GC duration took longer than the interval and
set the sleep duration to 0.

Suggested-by: Joe Stringer joe@cilium.io
Suggested-by: Dan Wendlandt dan@isovalent.com
Signed-off-by: Chris Tarazi chris@isovalent.com

@christarazi christarazi requested a review from a team as a code owner September 10, 2021 05:38
@christarazi christarazi requested a review from nebril September 10, 2021 05:38
@christarazi christarazi added area/operator Impacts the cilium-operator component area/kvstore Impacts the KVStore package interactions. needs-backport/1.10 release-note/misc This PR makes changes that have no direct user impact. labels Sep 10, 2021
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Sep 10, 2021
@christarazi christarazi force-pushed the pr/christarazi/identity-gc-improvements branch 3 times, most recently from 341f25b to 69e7ce8 Compare September 10, 2021 05:44
@christarazi
Copy link
Member Author

christarazi commented Sep 10, 2021

test-me-please

Job 'Cilium-PR-K8s-1.19-kernel-5.4' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sDatapathConfig Host firewall With VXLAN

Failure Output

FAIL: Failed to reach 192.168.36.11:80 from testclient-7t87g

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-1.19-kernel-5.4 so I can create a new GitHub issue to track it.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor nit to make the warning a little bit more clear.

Noting also what we found on friday, which was that etcdconfig.etcd.qps can be a limiting factor here.

This is useful in warning or error level messages to help nudge the user
in the right direction when troubleshooting.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/identity-gc-improvements branch from 69e7ce8 to 728b2b9 Compare September 13, 2021 16:59
@christarazi christarazi requested a review from a team as a code owner September 13, 2021 16:59
@christarazi
Copy link
Member Author

christarazi commented Sep 13, 2021

test-me-please

Edit: Images timed out

@christarazi
Copy link
Member Author

test-me-please

Copy link
Member

@aanm aanm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only a small nit.

@christarazi christarazi force-pushed the pr/christarazi/identity-gc-improvements branch from 728b2b9 to 3cd71e1 Compare September 14, 2021 16:36
With this commit, the identity GC rate limit
(--identity-gc-rate-interval) becomes the effective rate at which
identities are garbage collected. Previously, the identity GC interval
(--identity-gc-interval) would cause the Operator to GC for that much
time, then the sleep for that much time, rinse and repeat, effectively
halving the rate.

To use concrete numbers for an example, let's say our interval is 5m and
our GC rate interval is 1000 per minute. It would mean that previously,
we would GC 5000 identities at a maximum for 10m (assuming that deletion
takes 0s). How was that calculated? Each minute, we GC 1000 identities.
After 5m, we have GC'd 5000 identities. But now we have to sleep for 5m
because that's our GC interval. Hence making our effective GC rate 500
per minute (instead of being 1000/m).

Now, we compute the time taken to perform the actual GC and subtract
that from the interval. So in our above example, we eliminate the dead
time of 5m and avoid slashing our effective GC rate in half. This change
allows the Operator to keep up with the demand more efficiently. The
Operator will warn if the GC duration took longer than the interval and
set the sleep duration to 0.

Suggested-by: Joe Stringer <joe@cilium.io>
Suggested-by: Dan Wendlandt <dan@isovalent.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/identity-gc-improvements branch from 3cd71e1 to d4fad43 Compare September 14, 2021 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kvstore Impacts the KVStore package interactions. area/operator Impacts the cilium-operator component release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants