operator: impove handling taints in operator #35785

marseel · 2024-11-05T15:22:29Z

Previously, during each upgrade of Cilium, operator had to taint and
untaint nodes whenever Cilium was scheduled and then running.
Hovewer, at large clusters, this resulted in multiple parallel requests
that were rate-limited on client-go in case of low qps.
This resulted in a huge queue of updates pending with stale node
information.
While simple mitigation of increasing qps works, we should cap max
number of requests that operator issues at a given time to reduce
staleness of node information. Additionally, it prevents starving other
clients of client-go.

This commit introduces workers that synchronize taint information
instead of relying on unbounded number of controllers.
Additionally, we merge node and pod update queues into a single
workqueue.

operator: improve the responsiveness of tainting and setting conditions on k8s nodes

marseel · 2024-11-06T11:18:57Z

/test

marseel · 2024-11-06T13:00:20Z

/test

giorio94

Thanks, looks great to me. Just a couple of nits inline.

operator/watchers/node.go

operator/watchers/node_taint.go

marseel · 2024-11-07T12:18:23Z

/test

operator/watchers/node_taint.go

marseel · 2024-11-12T09:28:52Z

/test

Previously, during each upgrade of Cilium, operator had to taint and untaint nodes whenever Cilium was scheduled and then running. Hovewer, at large clusters, this resulted in multiple parallel requests that were rate-limited on client-go in case of low qps. This resulted in a huge queue of updates pending with stale node information. While simple mitigation of increasing qps works, we should cap max number of requests that operator issues at a given time to reduce staleness of node information. Additionally, it prevents starving other clients of client-go. This commit introduces workers that synchronize taint information instead of relying on unbounded number of controllers. Additionally, we merge node and pod update queues into a single workqueue. Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>

marseel · 2024-11-12T16:40:09Z

/test

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 5, 2024

marseel force-pushed the pr/marseel/fix_taint_queue branch 4 times, most recently from b9770ff to d18c519 Compare November 6, 2024 10:26

marseel added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Nov 6, 2024

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 6, 2024

marseel force-pushed the pr/marseel/fix_taint_queue branch 2 times, most recently from 7215dad to bdbc7f8 Compare November 6, 2024 11:06

marseel force-pushed the pr/marseel/fix_taint_queue branch from bdbc7f8 to cb016c5 Compare November 6, 2024 12:58

marseel requested a review from squeed November 6, 2024 13:53

marseel marked this pull request as ready for review November 6, 2024 13:53

marseel requested review from a team as code owners November 6, 2024 13:53

marseel requested a review from giorio94 November 6, 2024 13:53

giorio94 approved these changes Nov 6, 2024

View reviewed changes

operator/watchers/node.go Outdated Show resolved Hide resolved

operator/watchers/node_taint.go Outdated Show resolved Hide resolved

giorio94 changed the title ~~operator: imporove handling taints in operator~~ operator: impove handling taints in operator Nov 6, 2024

marseel force-pushed the pr/marseel/fix_taint_queue branch from cb016c5 to b7f7d0f Compare November 7, 2024 12:10

squeed reviewed Nov 8, 2024

View reviewed changes

operator/watchers/node_taint.go Outdated Show resolved Hide resolved

squeed reviewed Nov 8, 2024

View reviewed changes

operator/watchers/node_taint.go Show resolved Hide resolved

squeed reviewed Nov 8, 2024

View reviewed changes

operator/watchers/node_taint.go Show resolved Hide resolved

squeed reviewed Nov 8, 2024

View reviewed changes

operator/watchers/node_taint.go Show resolved Hide resolved

squeed approved these changes Nov 11, 2024

View reviewed changes

marseel force-pushed the pr/marseel/fix_taint_queue branch from b7f7d0f to 80f17ae Compare November 12, 2024 09:27

marseel force-pushed the pr/marseel/fix_taint_queue branch from 80f17ae to dbc3bab Compare November 12, 2024 16:39

maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 13, 2024

squeed enabled auto-merge November 13, 2024 10:40

squeed added this pull request to the merge queue Nov 13, 2024

Merged via the queue into cilium:main with commit 3a34121 Nov 13, 2024
64 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

operator: impove handling taints in operator #35785

operator: impove handling taints in operator #35785

Uh oh!

marseel commented Nov 5, 2024 •

edited

Loading

Uh oh!

marseel commented Nov 6, 2024

Uh oh!

marseel commented Nov 6, 2024

Uh oh!

giorio94 left a comment

Uh oh!

Uh oh!

Uh oh!

marseel commented Nov 7, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marseel commented Nov 12, 2024

Uh oh!

marseel commented Nov 12, 2024

Uh oh!

Uh oh!

Uh oh!

operator: impove handling taints in operator #35785

operator: impove handling taints in operator #35785

Uh oh!

Conversation

marseel commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marseel commented Nov 6, 2024

Uh oh!

marseel commented Nov 6, 2024

Uh oh!

giorio94 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

marseel commented Nov 7, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marseel commented Nov 12, 2024

Uh oh!

marseel commented Nov 12, 2024

Uh oh!

Uh oh!

Uh oh!

marseel commented Nov 5, 2024 •

edited

Loading