-
Notifications
You must be signed in to change notification settings - Fork 3.4k
jenkins: switch to ad-hoc GKE cluster creation/deletion #19918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The general idea is to remove the need for our permanent pool of GKE clusters + management cluster (that manages the pool via Config Connector). Instead, we switch to ad-hoc clusters as we do on CI 3.0. This should: - Remove the upper limit on the number of concurrent Jenkins GKE jobs. - Remove the need for permanent clusters (reduce CI costs). - Have no effect on the setup time required before the tests actually start running on GKE clusters. - Improve control over GKE features (e.g. `DenyServiceExternalIPs` admission controller) that cannot be controlled via CNRM / Config Connector. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
New GKE clusters have the automatic labelling feature gate enabled by default, so the labels used in the `Identity CLI testing` `K8sCLI` test need to be updated with the additional `k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name` automatic label. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
/test |
Notes to reviewers:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not part of the CI team, but looks good to me nonetheless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I glanced through the bash, set -e
is present in the new script and it seems like if it fails, the outer code will call into the release-cluster.sh
. 👍
How is that possible if we are going from a pool of pre-created clusters to creating the clusters as part of the CI job? |
The clusters in the pool are already created but were actually scaled down to 0 nodes when not in use, then scaled back up to 2 nodes when in use. In practice this operation takes about the same amount of time as creating a new cluster. |
The general idea is to remove the need for our permanent pool of GKE clusters + management cluster (that manages the pool via Config Connector).
Instead, we switch to ad-hoc clusters as we do on CI 3.0. This should:
DenyServiceExternalIPs
admission controller) that cannot be controlled via CNRM / Config Connector.