-
Notifications
You must be signed in to change notification settings - Fork 151
feat: enhancing concurrent reconciliations #790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for kamaji-documentation ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
e2e is successful, it's just a false positive, happened time to time with GH Actions. |
I got 5/10 with
|
Why those TCP are in It seems to me it's unrelated to the changes introduced with this PR. Is the Deployment and related Pods matching the expected replicas? |
c8d297f
to
5c65a06
Compare
@avorima it seems we can skip checking the channel status, I just added a safety net with a timeout for the Generic Event communication. It would be great if you could give it a shot. |
Channels used for GenericEvent feeding for cross controllers triggers are now buffered according to the --max-concurrent-tcp-reconciles: this is required to avoid channel full errors when dealing with large management clusters serving a sizeable amount of Tenant Control Planes. Increasing this value will put more pressure on memory (mostly for GC) and CPU (provisioning multiple certificates at the same time). Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
This change introduces a grace period of 10 seconds before abruptly terminating the Tenant Control Plane deployment, allowing the soot manager to complete its exit procedure and avoid false positive errors due to API Server being unresponsive due to user deletion. Aim of this change is reducing the amount of false positive errors upon mass deletion of Tenant COntrol Plane objects. Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
5c65a06
to
1f0d3bd
Compare
WatchesRawSource is non blocking, no need to check if channel is full. To prevent deadlocks a WithTimeout check has been introduced. Signed-off-by: Dario Tranchitella <dario@tranchitella.eu>
1f0d3bd
to
5ec7ebc
Compare
Did you increase the CPU limits? That caused all kinds of weird issues for me. |
Strictly related to #787, started from a benchmark of Kamaji in reconciling multiple Tenant Control Planes.
When dealing with multiple Tenant Control Plane creations, my suggestion is to increase the
--max-concurrent-tcp-reconciles
flag according to the benchmarked scenarios.e.g.: default value of
1
and creating dozen of clusters:When increasing the said flag (e.g.: 10)