Skip to content

Conversation

marseel
Copy link
Contributor

@marseel marseel commented Nov 5, 2024

Previously, during each upgrade of Cilium, operator had to taint and
untaint nodes whenever Cilium was scheduled and then running.
Hovewer, at large clusters, this resulted in multiple parallel requests
that were rate-limited on client-go in case of low qps.
This resulted in a huge queue of updates pending with stale node
information.
While simple mitigation of increasing qps works, we should cap max
number of requests that operator issues at a given time to reduce
staleness of node information. Additionally, it prevents starving other
clients of client-go.

This commit introduces workers that synchronize taint information
instead of relying on unbounded number of controllers.
Additionally, we merge node and pod update queues into a single
workqueue.

operator: improve the responsiveness of tainting and setting conditions on k8s nodes

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 5, 2024
@marseel marseel force-pushed the pr/marseel/fix_taint_queue branch 4 times, most recently from b9770ff to d18c519 Compare November 6, 2024 10:26
@marseel marseel added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Nov 6, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 6, 2024
@marseel marseel force-pushed the pr/marseel/fix_taint_queue branch 2 times, most recently from 7215dad to bdbc7f8 Compare November 6, 2024 11:06
@marseel
Copy link
Contributor Author

marseel commented Nov 6, 2024

/test

@marseel marseel force-pushed the pr/marseel/fix_taint_queue branch from bdbc7f8 to cb016c5 Compare November 6, 2024 12:58
@marseel
Copy link
Contributor Author

marseel commented Nov 6, 2024

/test

@marseel marseel requested a review from squeed November 6, 2024 13:53
@marseel marseel marked this pull request as ready for review November 6, 2024 13:53
@marseel marseel requested review from a team as code owners November 6, 2024 13:53
@marseel marseel requested a review from giorio94 November 6, 2024 13:53
Copy link
Member

@giorio94 giorio94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks great to me. Just a couple of nits inline.

@giorio94 giorio94 changed the title operator: imporove handling taints in operator operator: impove handling taints in operator Nov 6, 2024
@marseel marseel force-pushed the pr/marseel/fix_taint_queue branch from cb016c5 to b7f7d0f Compare November 7, 2024 12:10
@marseel
Copy link
Contributor Author

marseel commented Nov 7, 2024

/test

@marseel marseel force-pushed the pr/marseel/fix_taint_queue branch from b7f7d0f to 80f17ae Compare November 12, 2024 09:27
@marseel
Copy link
Contributor Author

marseel commented Nov 12, 2024

/test

Previously, during each upgrade of Cilium, operator had to taint and
untaint nodes whenever Cilium was scheduled and then running.
Hovewer, at large clusters, this resulted in multiple parallel requests
that were rate-limited on client-go in case of low qps.
This resulted in a huge queue of updates pending with stale node
information.
While simple mitigation of increasing qps works, we should cap max
number of requests that operator issues at a given time to reduce
staleness of node information. Additionally, it prevents starving other
clients of client-go.

This commit introduces workers that synchronize taint information
instead of relying on unbounded number of controllers.
Additionally, we merge node and pod update queues into a single
workqueue.

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
@marseel marseel force-pushed the pr/marseel/fix_taint_queue branch from 80f17ae to dbc3bab Compare November 12, 2024 16:39
@marseel
Copy link
Contributor Author

marseel commented Nov 12, 2024

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 13, 2024
@squeed squeed enabled auto-merge November 13, 2024 10:40
@squeed squeed added this pull request to the merge queue Nov 13, 2024
Merged via the queue into cilium:main with commit 3a34121 Nov 13, 2024
64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants