Skip to content

Conversation

ldoktor
Copy link
Contributor

@ldoktor ldoktor commented May 13, 2025

This PR is based on top of #11231, please ignore the first 2 commits as they are required to pass the CI.

Recently we got auth issue https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-kata-containers-kata-containers-main-peer-pods-e2e-tests/1921460887898558464 in our pipeline claiming it might be related to permission propagation. While fixing that I accumulated a few little improvements, please see the individual changes in commit messages.

@katacontainersbot katacontainersbot added the size/small Small and simple task label May 13, 2025
ldoktor added 2 commits May 20, 2025 17:03
set permissions of the peer-pods-azure.sh script to executable

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
We forgot to add the license header when introducing this test.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
##################################
# Log warning when peering created
##################################
if [[ "${AZURE_REGION}" != "${PP_REGION}" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with the rest of the script, but at least from manual installation, I would expect that peering is somewhat distinct from the fact that we have different regions. In other words, different regions requires peering, but we might have later a need / desire to enable peering in the same region.

Would it be possible to add a variable that explicitly records when we activated peering? Probably more robust in the long run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note the peering is usually not required and definitely not supported by kata. But currently it works(TM) and allows us to get coverage despite the region prow decides to run our jobs in. (upstream azure images are only available in eastus region)

Copy link
Member

@c3d c3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Minor suggestions regarding wording of a commit message and wording of a warning message, but nothing blocking, and no need to re-review if you change only that.

Another suggestion regarding the detection of peering, but that can come later and should not block this merge (unless you prefer to iterate once more in this PR).

ldoktor added 2 commits May 21, 2025 10:28
In CI we hit problem where just after `az login` the first `az
network vnet list` command fails due to permission. We see
"insufficient permissions" or "pending permissions", suggesting we should
retry later. Manual tests and successful runs indicate we do have the
permissions, but not immediately after login.

Azure docs suggest using extra `az account set` but still the
propagation might take some time. Add a loop retrying
the first command a few times before declaring failure.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
We used hardcoded "ci/openshift-ci/cluster" location which expects this
script to be only executed from the root. Let's use SCRIPT_DIR instead
to allow execution from elsewhere eg. by user bisecting a failed CI run.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
this script relies on temporary subscriptions and won't cleanup any
resources. Let's improve the logging to better describe what resources
were created and how to clean them, if the user needs to do so.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
@ldoktor
Copy link
Contributor Author

ldoktor commented May 21, 2025

Changes:

  • improved commit messages
  • added a variable to signal we use peering (rather than using conditions)
  • further improvement of the extra resources logging (the last commit)

Copy link
Member

@c3d c3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect now. Thanks @ldoktor

Copy link
Contributor

@wainersm wainersm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ldoktor , very good improvements, thanks!

The commit to fix issues with permission propagation in azure is highly appreciated in our downstream CI I guess.

@wainersm wainersm merged commit d77e33b into kata-containers:main May 26, 2025
519 of 540 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test size/small Small and simple task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants