[RayCluster][Fix] evicted head-pod can be recreated or restarted #2217

JasonChen86899 · 2024-07-02T15:10:03Z

Why are these changes needed?

This PR attempts to fix issues #2125
if head pod has been evicted, we will delete it and let it restart or recreate

Related issue number

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

kevin85421 · 2024-07-02T15:58:12Z

Hey @JasonChen86899, I didn't know that you wanted to work on the issue. I have already assigned the issue to @MortalHappiness before you open this PR. Maybe we can find other issues to collaborate on if you are interested in contributing to KubeRay? Sorry for the inconvenience.

MortalHappiness · 2024-07-02T16:13:16Z

@kevin85421 I am OK that if @JasonChen86899 wants to create a PR.

kevin85421 · 2024-07-02T16:25:20Z

@MortalHappiness Thanks!

JasonChen86899 · 2024-07-02T16:48:25Z

Hey @JasonChen86899, I didn't know that you wanted to work on the issue. I have already assigned the issue to @MortalHappiness before you open this PR. Maybe we can find other issues to collaborate on if you are interested in contributing to KubeRay? Sorry for the inconvenience.

@kevin85421 Sorry, I just made a draft and didn't notice that it was assigned, I have closed it. cc @MortalHappiness

kevin85421 · 2024-07-02T16:53:31Z

@JasonChen86899 No worries. I have already synced with @MortalHappiness. He is comfortable with your PR. We can review this PR together. You don't need to close it.

andrewsykim · 2024-07-03T03:42:50Z

ray-operator/controllers/ray/raycluster_controller.go


 	// If the Pod's status is `Failed` or `Succeeded`, the Pod will not restart and we can safely delete it.
 	if pod.Status.Phase == corev1.PodFailed || pod.Status.Phase == corev1.PodSucceeded {
-		if isRestartPolicyAlways {
+		if isRestartPolicyAlways && !isPodEvicted {


It seems like think this if statement down to line 850 can be removed, unless we are know of other reasons a pod phase would be Failed where the pod shouldn't be recreated. The comments below suggest we only added this check because we couldn't reproduce the Failed phase when restart policy is Always.

(if we keep this if statement, update the comments below on why evicted is treated differently)

It seems like think this if statement down to line 850 can be removed, unless we are know of other reasons a pod phase would be Failed where the pod shouldn't be recreated. The comments below suggest we only added this check because we couldn't reproduce the Failed phase when restart policy is Always.

There are errors in this comments, and there was some discussion about this in the issue. I will update them then for review

code comments updated, please review. @andrewsykim

kevin85421 · 2024-07-03T18:33:18Z

ray-operator/controllers/ray/raycluster_controller.go

+		if isRestartPolicyAlways && !isPodEvicted {
+			// A Pod with `RestartPolicy: Always` will guarantee that the status transition
+			// from `Running` to `Failed / Succeeded` and back to `Running`.
+			// However, if the Non-Controller Managed Pod(e.g., head pod) is evicted


what does non-controller managed Pod mean?

what does non-controller managed Pod mean?

"non-controller managed Pod" refers to Pods that are directly created by users without being managed by any Kubernetes controllers (such as Deployments, ReplicaSets, StatefulSets, etc.).
For example, If pod is managed by ReplicaSets, when it is evicted, it will be re-scheduled

ray-operator/controllers/ray/raycluster_controller.go

ray-operator/controllers/ray/raycluster_controller_unit_test.go

ray-operator/controllers/ray/raycluster_controller.go

kevin85421 · 2024-07-10T05:45:45Z

Is this PR ready for review? It is still marked as a draft.

JasonChen86899 · 2024-07-10T09:50:04Z

Is this PR ready for review? It is still marked as a draft.

marked ready for review

andrewsykim · 2024-07-10T19:40:46Z

ray-operator/controllers/ray/raycluster_controller.go

 	isRestartPolicyAlways := pod.Spec.RestartPolicy == corev1.RestartPolicyAlways
+	isPodEvicted := pod.Status.Reason == utils.PodEvictedReason


Are we sure pod eviction is the only case where pod phase is Failed? Should we future proof this logic for future cases and always delete pod when phase is Failed?

func shouldDeletePod(pod corev1.Pod, nodeType rayv1.RayNodeType) (bool, string) { // If the Pod's status is `Failed` or `Succeeded`, the Pod will not restart and we can safely delete it. if pod.Status.Phase == corev1.PodFailed || pod.Status.Phase == corev1.PodSucceeded { reason := fmt.Sprintf( "The %s Pod %s status is %s which is a terminal state "+ "KubeRay will delete the Pod and create new Pods in the next reconciliation if necessary.", nodeType, pod.Name, pod.Status.Phase) return true, reason } ... ...

Eviction is not the only reason for phase Failed. Currently managed pods will briefly fail and quickly transition to running (restart policy Always) and we no need to delete it, but if Eviction they will not and keep Failed.
So handle this scene separately

Currently managed pods will briefly fail and quickly transition to running (restart policy Always)

The existing comments here imply that this is not the case:

kuberay/ray-operator/controllers/ray/raycluster_controller.go

Lines 838 to 841 in 8e5f88c

// Based on my observation, a Pod with `RestartPolicy: Always` will never be in the terminated states (i.e., `Failed` or `Succeeded`).

// However, I couldn't find any well-defined behavior in the Kubernetes documentation, so I can't guarantee that the status transition

// from `Running` to `Failed / Succeeded` and back to `Running` won't occur when we kill the main process (i.e., `ray start` in KubeRay)

// in the head Pod. Therefore, I've added this check as a safeguard.

Have you tested when restart policy is Always and the reason is not Evicted, that the phase temporarily changes to Failed?

Thank you for your question. I have re examined and found that my previous understanding may have been incorrect. This is the logic I found about pod status updates from the k8s source code, and I found that if the restart strategy remains always, its status remains running
https://github.com/kubernetes/kubernetes/blob/46aa8959a0659e22c924bb52b38385d441715b2b/pkg/kubelet/kubelet_pods.go#L1556

For your scenario, there should not be such a situation, so it cannot be reproduced.
However, from Failed to Running, there is indeed a possibility of human intervention in pod configuration, node resource recovery, etc. Therefore, is it necessary to handle this concurrent scenario: when a node fails, it also needs to query the latest node status and perform a deletion operation

Signed-off-by: Goober <chenhao86899@gmail.com>

JasonChen86899 · 2024-07-15T11:24:08Z

Based on this logic, I update the logic of the shoulDelete function
https://github.com/kubernetes/kubernetes/blob/46aa8959a0659e22c924bb52b38385d441715b2b/pkg/kubelet/kubelet_pods.go#L1556

Terminated state (end or successful), delete it
Running state,If the restart policy is Never, delete it
The rest are not deleted

@kevin85421 @andrewsykim

kevin85421 · 2024-07-24T06:54:31Z

I will revisit this PR this week. I hope to include this PR in v1.2.0. I will cut the branch next week.

kevin85421 · 2024-08-06T04:39:47Z

cc @MortalHappiness would you mind reviewing this PR?

MortalHappiness

Overall LGTM. I tried to run your PR with the steps in #2125 (comment) and it can successfully recreate the head pod after the taint node.kubernetes.io/memory-pressure:NoSchedule- is removed.

Here are some suggestions.

MortalHappiness · 2024-08-06T18:46:54Z

ray-operator/controllers/ray/raycluster_controller_unit_test.go

@@ -931,8 +931,10 @@ func TestReconcile_PodEvicted_DiffLess0_OK(t *testing.T) {
 	assert.Equal(t, len(testPods), len(podList.Items), "Init pod list len is wrong")

 	// Simulate head pod get evicted.
+	podList.Items[0].Spec.RestartPolicy = corev1.RestartPolicyAlways


Add tests for RestartPolicyOnFailure.

MortalHappiness · 2024-08-06T18:53:17Z

ray-operator/controllers/ray/raycluster_controller_unit_test.go

 			// The restart policy is `Always` and the Pod is in a terminate state.
 			// The expected behavior is that the controller will not delete the Pod because
 			// the restart policy is `Always`.


The comments here should be updated. The controller will delete the pod regardless the restart policy.

kevin85421

LGTM. @MortalHappiness would you mind opening a PR to address your comments?

Note for myself:

If the restartPolicy is Always and eviction happened, the Pod status will be PodFailed.

kevin85421 · 2024-08-09T03:22:31Z

ray-operator/controllers/ray/raycluster_controller.go

-	// the Pod and rely on the Pod's restart policy to restart the Pod.
-	isRestartPolicyAlways := pod.Spec.RestartPolicy == corev1.RestartPolicyAlways
+	// Based on the logic of the change of the status of the K8S pod, the following judgment is made.
+	// https://github.com/kubernetes/kubernetes/blob/3361895612dac57044d5dacc029d2ace1865479c/pkg/kubelet/kubelet_pods.go#L1556

 	// If the Pod's status is `Failed` or `Succeeded`, the Pod will not restart and we can safely delete it.
 	if pod.Status.Phase == corev1.PodFailed || pod.Status.Phase == corev1.PodSucceeded {


I have a bit of concern about this. What is the behavior of K8s when a Pod with restartPolicy: Always goes into a terminal state? Is it possible to restart?

If it restarts and Ray schedules a new Ray task or actor to the new Ray node, then KubeRay starts the next reconciliation and kills the Pod. Due to a delay in the informer, KubeRay still thinks the Pod is in a terminal state.

JasonChen86899 closed this Jul 2, 2024

JasonChen86899 reopened this Jul 3, 2024

andrewsykim reviewed Jul 3, 2024

View reviewed changes

JasonChen86899 force-pushed the fix-evicted-head-pod branch 2 times, most recently from c29b6d7 to 0d39b40 Compare July 3, 2024 14:02

kevin85421 reviewed Jul 3, 2024

View reviewed changes

kevin85421 assigned kevin85421 and MortalHappiness Jul 3, 2024

MortalHappiness requested changes Jul 4, 2024

View reviewed changes

ray-operator/controllers/ray/raycluster_controller.go Outdated Show resolved Hide resolved

ray-operator/controllers/ray/raycluster_controller.go Outdated Show resolved Hide resolved

ray-operator/controllers/ray/raycluster_controller.go Outdated Show resolved Hide resolved

JasonChen86899 force-pushed the fix-evicted-head-pod branch from 0d39b40 to d66b3ce Compare July 4, 2024 09:59

JasonChen86899 requested review from kevin85421 and MortalHappiness July 8, 2024 06:51

JasonChen86899 marked this pull request as ready for review July 10, 2024 09:48

andrewsykim reviewed Jul 10, 2024

View reviewed changes

JasonChen86899 force-pushed the fix-evicted-head-pod branch from d66b3ce to ffac9ae Compare July 15, 2024 10:51

fix: evicted-head-pod not re-created

0a463a2

Signed-off-by: Goober <chenhao86899@gmail.com>

JasonChen86899 force-pushed the fix-evicted-head-pod branch from ffac9ae to 0a463a2 Compare July 15, 2024 10:56

JasonChen86899 requested a review from andrewsykim July 16, 2024 04:05

kevin85421 added the 1.2.0 label Jul 23, 2024

MortalHappiness reviewed Aug 6, 2024

View reviewed changes

This comment was marked as duplicate.

Sign in to view

kevin85421 mentioned this pull request Aug 8, 2024

[Bug] [raycluster-controller] Kuberay cannot recreate new raycluster header pod when it has been evicted by kubelet as disk pressure #2125

Closed

2 tasks

kevin85421 approved these changes Aug 9, 2024

View reviewed changes

kevin85421 merged commit 6c1c16e into ray-project:master Aug 9, 2024
25 checks passed

kevin85421 mentioned this pull request Aug 9, 2024

What's the behavior for a Pod with restartPolicy Always when it is in PodFailed? #2293

Closed

2 tasks

This was referenced Aug 9, 2024

[Minor][Chore] Fix wrong comment #2294

Merged

[Test][RayCluster] Add tests for RestartPolicyOnFailure for eviction #2302

Merged

		isRestartPolicyAlways := pod.Spec.RestartPolicy == corev1.RestartPolicyAlways
		isPodEvicted := pod.Status.Reason == utils.PodEvictedReason

	// Based on my observation, a Pod with `RestartPolicy: Always` will never be in the terminated states (i.e., `Failed` or `Succeeded`).
	// However, I couldn't find any well-defined behavior in the Kubernetes documentation, so I can't guarantee that the status transition
	// from `Running` to `Failed / Succeeded` and back to `Running` won't occur when we kill the main process (i.e., `ray start` in KubeRay)
	// in the head Pod. Therefore, I've added this check as a safeguard.

[RayCluster][Fix] evicted head-pod can be recreated or restarted #2217

[RayCluster][Fix] evicted head-pod can be recreated or restarted #2217

Uh oh!

Conversation

JasonChen86899 commented Jul 2, 2024

Why are these changes needed?

Related issue number

Checks

Uh oh!

kevin85421 commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MortalHappiness commented Jul 2, 2024

Uh oh!

kevin85421 commented Jul 2, 2024

Uh oh!

JasonChen86899 commented Jul 2, 2024

Uh oh!

kevin85421 commented Jul 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevin85421 commented Jul 10, 2024

Uh oh!

JasonChen86899 commented Jul 10, 2024

Uh oh!

andrewsykim Jul 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JasonChen86899 commented Jul 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevin85421 commented Jul 24, 2024

Uh oh!

kevin85421 commented Aug 6, 2024

Uh oh!

MortalHappiness left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as duplicate.

Uh oh!

This comment was marked as duplicate.

Uh oh!

kevin85421 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kevin85421 commented Jul 2, 2024 •

edited

Loading

andrewsykim Jul 10, 2024 •

edited

Loading

JasonChen86899 commented Jul 15, 2024 •

edited

Loading

MortalHappiness left a comment •

edited

Loading