Skip to content

jenkins: switch to ad-hoc GKE cluster creation/deletion #19918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 25, 2022

Conversation

nbusseneau
Copy link
Member

The general idea is to remove the need for our permanent pool of GKE clusters + management cluster (that manages the pool via Config Connector).

Instead, we switch to ad-hoc clusters as we do on CI 3.0. This should:

  • Remove the upper limit on the number of concurrent Jenkins GKE jobs.
  • Remove the need for permanent clusters (reduce CI costs).
  • Have no effect on the setup time required before the tests actually start running on GKE clusters.
  • Improve control over GKE features (e.g. DenyServiceExternalIPs admission controller) that cannot be controlled via CNRM / Config Connector.

@nbusseneau nbusseneau added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. labels May 23, 2022
The general idea is to remove the need for our permanent pool of GKE
clusters + management cluster (that manages the pool via Config
Connector).

Instead, we switch to ad-hoc clusters as we do on CI 3.0. This should:

- Remove the upper limit on the number of concurrent Jenkins GKE jobs.
- Remove the need for permanent clusters (reduce CI costs).
- Have no effect on the setup time required before the tests actually
  start running on GKE clusters.
- Improve control over GKE features (e.g. `DenyServiceExternalIPs`
  admission controller) that cannot be controlled via CNRM /
  Config Connector.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
New GKE clusters have the automatic labelling feature gate enabled by
default, so the labels used in the `Identity CLI testing` `K8sCLI` test
need to be updated with the additional
`k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name`
automatic label.

Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
@nbusseneau nbusseneau marked this pull request as ready for review May 24, 2022 15:28
@nbusseneau nbusseneau requested review from a team as code owners May 24, 2022 15:28
@nbusseneau nbusseneau requested review from ldelossa and nebril May 24, 2022 15:28
@nbusseneau
Copy link
Member Author

/test

@nbusseneau
Copy link
Member Author

nbusseneau commented May 24, 2022

Notes to reviewers:

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not part of the CI team, but looks good to me nonetheless.

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@nbusseneau nbusseneau added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels May 25, 2022
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I glanced through the bash, set -e is present in the new script and it seems like if it fails, the outer code will call into the release-cluster.sh. 👍

@joestringer joestringer merged commit b42e5a0 into cilium:master May 25, 2022
@pchaigno
Copy link
Member

Have no effect on the setup time required before the tests actually start running on GKE clusters.

How is that possible if we are going from a pool of pre-created clusters to creating the clusters as part of the CI job?

@nbusseneau
Copy link
Member Author

How is that possible if we are going from a pool of pre-created clusters to creating the clusters as part of the CI job?

The clusters in the pool are already created but were actually scaled down to 0 nodes when not in use, then scaled back up to 2 nodes when in use. In practice this operation takes about the same amount of time as creating a new cluster.

@tklauser tklauser added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels Jun 2, 2022
@nbusseneau nbusseneau deleted the pr/fix-gke branch July 11, 2024 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants