Skip to content

test: Wait for pod termination in K8sServicesTest #19750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 17, 2022

Conversation

brb
Copy link
Member

@brb brb commented May 9, 2022

Take No. 2 🤞

Wait for pod termination before removing Cilium.

The premature removal of Cilium might cause the removal of any test pods
to fail. For example the following CI flake:

"Pods are still terminating:
    [echo-694c58bbf4-896gh echo-694c58bbf4-fr4ck]"

This is due to the missing CNI plugin. From the kubelet logs:

failed to "KillPodSandbox" for "..."
with KillPodSandboxError: "rpc error: code = Unknown desc =
networkPlugin cni failed to teardown pod
\"echo-694c58bbf4-fr4ck_default\" network: failed to find plugin
\"cilium-cni\" in path [/opt/cni/bin]"

The proposed change is not ideal, as the ExpectAllPodsInNsTerminated()
function is racy. If neither of pods have not entered the termination
state yet, the function will return too early (without waiting for the
termination).

The proper solution would be to use the deployment manager used by
the K8sDatapathConfig. However, the manager would require significant
changes. Considering that we are planning to completely change the
integration suite, the proper solution is not worth time.

Fix #18895

@brb brb added area/CI Continuous Integration testing issue or flake release-note/ci This PR makes changes to the CI. needs-backport/1.11 labels May 9, 2022
@brb
Copy link
Member Author

brb commented May 9, 2022

/test

1 similar comment
@brb
Copy link
Member Author

brb commented May 9, 2022

/test

@brb
Copy link
Member Author

brb commented May 9, 2022

CI is hitting #19751.

@brb brb force-pushed the pr/brb/ci-fix-echo-termination-flake branch from ed68273 to 4556b43 Compare May 10, 2022 07:40
@brb
Copy link
Member Author

brb commented May 10, 2022

/test

Wait for pod termination before removing Cilium.

The premature removal of Cilium might cause the removal of any test pods
to fail. For example the following CI flake:

    "Pods are still terminating:
        [echo-694c58bbf4-896gh echo-694c58bbf4-fr4ck]"

This is due to the missing CNI plugin. From the kubelet logs:

    failed to "KillPodSandbox" for "..."
    with KillPodSandboxError: "rpc error: code = Unknown desc =
    networkPlugin cni failed to teardown pod
    \"echo-694c58bbf4-fr4ck_default\" network: failed to find plugin
    \"cilium-cni\" in path [/opt/cni/bin]"

The proposed change is not ideal, as the ExpectAllPodsInNsTerminated()
function is racy. If neither of pods have not entered the termination
state yet, the function will return too early (without waiting for the
termination).

The proper solution would be to use the deployment manager used by
the K8sDatapathConfig. However, the manager would require significant
changes. Considering that we are planning to completely change the
integration suite, the proper solution is not worth time.

Signed-off-by: Martynas Pumputis <m@lambda.lt>
@brb brb force-pushed the pr/brb/ci-fix-echo-termination-flake branch from 4556b43 to 655ac40 Compare May 11, 2022 08:16
@brb
Copy link
Member Author

brb commented May 11, 2022

/test

@brb
Copy link
Member Author

brb commented May 11, 2022

/test-1.22-4.19

@brb brb marked this pull request as ready for review May 11, 2022 13:55
@brb brb requested a review from a team May 11, 2022 13:55
@brb brb requested a review from a team as a code owner May 11, 2022 13:55
@brb brb requested review from joamaki and tklauser May 11, 2022 13:55
@brb brb added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 13, 2022
@christarazi christarazi merged commit e2adb8a into master May 17, 2022
@christarazi christarazi deleted the pr/brb/ci-fix-echo-termination-flake branch May 17, 2022 18:50
@jibi jibi added backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. and removed backport-pending/1.11 labels May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/CI Continuous Integration testing issue or flake backport-done/1.11 The backport for Cilium 1.11.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: K8sVerifier Runs the kernel verifier against Cilium's BPF datapath: terminating containers are not deleted after timeout
5 participants