Skip to content

[RayCluster][Fix] evicted head-pod can be recreated or restarted #2217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 9, 2024

Conversation

JasonChen86899
Copy link
Contributor

Why are these changes needed?

This PR attempts to fix issues #2125
if head pod has been evicted, we will delete it and let it restart or recreate

Related issue number

#2125

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@kevin85421
Copy link
Member

kevin85421 commented Jul 2, 2024

Hey @JasonChen86899, I didn't know that you wanted to work on the issue. I have already assigned the issue to @MortalHappiness before you open this PR. Maybe we can find other issues to collaborate on if you are interested in contributing to KubeRay? Sorry for the inconvenience.

@MortalHappiness
Copy link
Member

@kevin85421 I am OK that if @JasonChen86899 wants to create a PR.

@kevin85421
Copy link
Member

@MortalHappiness Thanks!

@JasonChen86899
Copy link
Contributor Author

Hey @JasonChen86899, I didn't know that you wanted to work on the issue. I have already assigned the issue to @MortalHappiness before you open this PR. Maybe we can find other issues to collaborate on if you are interested in contributing to KubeRay? Sorry for the inconvenience.

@kevin85421 Sorry, I just made a draft and didn't notice that it was assigned, I have closed it. cc @MortalHappiness

@kevin85421
Copy link
Member

@JasonChen86899 No worries. I have already synced with @MortalHappiness. He is comfortable with your PR. We can review this PR together. You don't need to close it.

@JasonChen86899 JasonChen86899 reopened this Jul 3, 2024

// If the Pod's status is `Failed` or `Succeeded`, the Pod will not restart and we can safely delete it.
if pod.Status.Phase == corev1.PodFailed || pod.Status.Phase == corev1.PodSucceeded {
if isRestartPolicyAlways {
if isRestartPolicyAlways && !isPodEvicted {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like think this if statement down to line 850 can be removed, unless we are know of other reasons a pod phase would be Failed where the pod shouldn't be recreated. The comments below suggest we only added this check because we couldn't reproduce the Failed phase when restart policy is Always.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(if we keep this if statement, update the comments below on why evicted is treated differently)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like think this if statement down to line 850 can be removed, unless we are know of other reasons a pod phase would be Failed where the pod shouldn't be recreated. The comments below suggest we only added this check because we couldn't reproduce the Failed phase when restart policy is Always.

There are errors in this comments, and there was some discussion about this in the issue. I will update them then for review

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code comments updated, please review. @andrewsykim

@JasonChen86899 JasonChen86899 force-pushed the fix-evicted-head-pod branch 2 times, most recently from c29b6d7 to 0d39b40 Compare July 3, 2024 14:02
if isRestartPolicyAlways && !isPodEvicted {
// A Pod with `RestartPolicy: Always` will guarantee that the status transition
// from `Running` to `Failed / Succeeded` and back to `Running`.
// However, if the Non-Controller Managed Pod(e.g., head pod) is evicted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does non-controller managed Pod mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does non-controller managed Pod mean?

"non-controller managed Pod" refers to Pods that are directly created by users without being managed by any Kubernetes controllers (such as Deployments, ReplicaSets, StatefulSets, etc.).
For example, If pod is managed by ReplicaSets, when it is evicted, it will be re-scheduled

@kevin85421
Copy link
Member

Is this PR ready for review? It is still marked as a draft.

@JasonChen86899 JasonChen86899 marked this pull request as ready for review July 10, 2024 09:48
@JasonChen86899
Copy link
Contributor Author

Is this PR ready for review? It is still marked as a draft.

marked ready for review

isRestartPolicyAlways := pod.Spec.RestartPolicy == corev1.RestartPolicyAlways
isPodEvicted := pod.Status.Reason == utils.PodEvictedReason
Copy link
Member

@andrewsykim andrewsykim Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure pod eviction is the only case where pod phase is Failed? Should we future proof this logic for future cases and always delete pod when phase is Failed?

func shouldDeletePod(pod corev1.Pod, nodeType rayv1.RayNodeType) (bool, string) {
	// If the Pod's status is `Failed` or `Succeeded`, the Pod will not restart and we can safely delete it.
	if pod.Status.Phase == corev1.PodFailed || pod.Status.Phase == corev1.PodSucceeded {
	    reason := fmt.Sprintf(
		"The %s Pod %s status is %s which is a terminal state "+
			"KubeRay will delete the Pod and create new Pods in the next reconciliation if necessary.",
			nodeType, pod.Name, pod.Status.Phase)
	   return true, reason
	}

       ...
       ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eviction is not the only reason for phase Failed. Currently managed pods will briefly fail and quickly transition to running (restart policy Always) and we no need to delete it, but if Eviction they will not and keep Failed.
So handle this scene separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently managed pods will briefly fail and quickly transition to running (restart policy Always)

The existing comments here imply that this is not the case:

// Based on my observation, a Pod with `RestartPolicy: Always` will never be in the terminated states (i.e., `Failed` or `Succeeded`).
// However, I couldn't find any well-defined behavior in the Kubernetes documentation, so I can't guarantee that the status transition
// from `Running` to `Failed / Succeeded` and back to `Running` won't occur when we kill the main process (i.e., `ray start` in KubeRay)
// in the head Pod. Therefore, I've added this check as a safeguard.

Have you tested when restart policy is Always and the reason is not Evicted, that the phase temporarily changes to Failed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your question. I have re examined and found that my previous understanding may have been incorrect. This is the logic I found about pod status updates from the k8s source code, and I found that if the restart strategy remains always, its status remains running
https://github.com/kubernetes/kubernetes/blob/46aa8959a0659e22c924bb52b38385d441715b2b/pkg/kubelet/kubelet_pods.go#L1556

For your scenario, there should not be such a situation, so it cannot be reproduced.
However, from Failed to Running, there is indeed a possibility of human intervention in pod configuration, node resource recovery, etc. Therefore, is it necessary to handle this concurrent scenario: when a node fails, it also needs to query the latest node status and perform a deletion operation

Signed-off-by: Goober <chenhao86899@gmail.com>
@JasonChen86899
Copy link
Contributor Author

JasonChen86899 commented Jul 15, 2024

Based on this logic, I update the logic of the shoulDelete function
https://github.com/kubernetes/kubernetes/blob/46aa8959a0659e22c924bb52b38385d441715b2b/pkg/kubelet/kubelet_pods.go#L1556

  1. Terminated state (end or successful), delete it
  2. Running state,If the restart policy is Never, delete it
  3. The rest are not deleted

@kevin85421 @andrewsykim

@kevin85421
Copy link
Member

I will revisit this PR this week. I hope to include this PR in v1.2.0. I will cut the branch next week.

@kevin85421
Copy link
Member

cc @MortalHappiness would you mind reviewing this PR?

Copy link
Member

@MortalHappiness MortalHappiness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I tried to run your PR with the steps in #2125 (comment) and it can successfully recreate the head pod after the taint node.kubernetes.io/memory-pressure:NoSchedule- is removed.

Here are some suggestions.

@@ -931,8 +931,10 @@ func TestReconcile_PodEvicted_DiffLess0_OK(t *testing.T) {
assert.Equal(t, len(testPods), len(podList.Items), "Init pod list len is wrong")

// Simulate head pod get evicted.
podList.Items[0].Spec.RestartPolicy = corev1.RestartPolicyAlways
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests for RestartPolicyOnFailure.

Comment on lines 2041 to 2043
// The restart policy is `Always` and the Pod is in a terminate state.
// The expected behavior is that the controller will not delete the Pod because
// the restart policy is `Always`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments here should be updated. The controller will delete the pod regardless the restart policy.

MortalHappiness

This comment was marked as duplicate.

MortalHappiness

This comment was marked as duplicate.

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @MortalHappiness would you mind opening a PR to address your comments?

Note for myself:

If the restartPolicy is Always and eviction happened, the Pod status will be PodFailed.

// the Pod and rely on the Pod's restart policy to restart the Pod.
isRestartPolicyAlways := pod.Spec.RestartPolicy == corev1.RestartPolicyAlways
// Based on the logic of the change of the status of the K8S pod, the following judgment is made.
// https://github.com/kubernetes/kubernetes/blob/3361895612dac57044d5dacc029d2ace1865479c/pkg/kubelet/kubelet_pods.go#L1556

// If the Pod's status is `Failed` or `Succeeded`, the Pod will not restart and we can safely delete it.
if pod.Status.Phase == corev1.PodFailed || pod.Status.Phase == corev1.PodSucceeded {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bit of concern about this. What is the behavior of K8s when a Pod with restartPolicy: Always goes into a terminal state? Is it possible to restart?

If it restarts and Ray schedules a new Ray task or actor to the new Ray node, then KubeRay starts the next reconciliation and kills the Pod. Due to a delay in the informer, KubeRay still thinks the Pod is in a terminal state.

@kevin85421 kevin85421 merged commit 6c1c16e into ray-project:master Aug 9, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants