-
Notifications
You must be signed in to change notification settings - Fork 1.2k
ci.ocp: A couple of peer-pods setup improvements #11266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
set permissions of the peer-pods-azure.sh script to executable Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
We forgot to add the license header when introducing this test. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
ci/openshift-ci/peer-pods-azure.sh
Outdated
################################## | ||
# Log warning when peering created | ||
################################## | ||
if [[ "${AZURE_REGION}" != "${PP_REGION}" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar enough with the rest of the script, but at least from manual installation, I would expect that peering is somewhat distinct from the fact that we have different regions. In other words, different regions requires peering, but we might have later a need / desire to enable peering in the same region.
Would it be possible to add a variable that explicitly records when we activated peering? Probably more robust in the long run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note the peering is usually not required and definitely not supported by kata. But currently it works(TM) and allows us to get coverage despite the region prow decides to run our jobs in. (upstream azure images are only available in eastus region)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Minor suggestions regarding wording of a commit message and wording of a warning message, but nothing blocking, and no need to re-review if you change only that.
Another suggestion regarding the detection of peering, but that can come later and should not block this merge (unless you prefer to iterate once more in this PR).
In CI we hit problem where just after `az login` the first `az network vnet list` command fails due to permission. We see "insufficient permissions" or "pending permissions", suggesting we should retry later. Manual tests and successful runs indicate we do have the permissions, but not immediately after login. Azure docs suggest using extra `az account set` but still the propagation might take some time. Add a loop retrying the first command a few times before declaring failure. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
We used hardcoded "ci/openshift-ci/cluster" location which expects this script to be only executed from the root. Let's use SCRIPT_DIR instead to allow execution from elsewhere eg. by user bisecting a failed CI run. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
this script relies on temporary subscriptions and won't cleanup any resources. Let's improve the logging to better describe what resources were created and how to clean them, if the user needs to do so. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect now. Thanks @ldoktor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ldoktor , very good improvements, thanks!
The commit to fix issues with permission propagation in azure is highly appreciated in our downstream CI I guess.
This PR is based on top of #11231, please ignore the first 2 commits as they are required to pass the CI.
Recently we got auth issue https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-kata-containers-kata-containers-main-peer-pods-e2e-tests/1921460887898558464 in our pipeline claiming it might be related to permission propagation. While fixing that I accumulated a few little improvements, please see the individual changes in commit messages.