-
Notifications
You must be signed in to change notification settings - Fork 604
[RayCluster][Fix] evicted head-pod can be recreated or restarted #2217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RayCluster][Fix] evicted head-pod can be recreated or restarted #2217
Conversation
Hey @JasonChen86899, I didn't know that you wanted to work on the issue. I have already assigned the issue to @MortalHappiness before you open this PR. Maybe we can find other issues to collaborate on if you are interested in contributing to KubeRay? Sorry for the inconvenience. |
@kevin85421 I am OK that if @JasonChen86899 wants to create a PR. |
@MortalHappiness Thanks! |
@kevin85421 Sorry, I just made a draft and didn't notice that it was assigned, I have closed it. cc @MortalHappiness |
@JasonChen86899 No worries. I have already synced with @MortalHappiness. He is comfortable with your PR. We can review this PR together. You don't need to close it. |
|
||
// If the Pod's status is `Failed` or `Succeeded`, the Pod will not restart and we can safely delete it. | ||
if pod.Status.Phase == corev1.PodFailed || pod.Status.Phase == corev1.PodSucceeded { | ||
if isRestartPolicyAlways { | ||
if isRestartPolicyAlways && !isPodEvicted { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like think this if statement down to line 850 can be removed, unless we are know of other reasons a pod phase would be Failed
where the pod shouldn't be recreated. The comments below suggest we only added this check because we couldn't reproduce the Failed phase when restart policy is Always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if we keep this if statement, update the comments below on why evicted is treated differently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like think this if statement down to line 850 can be removed, unless we are know of other reasons a pod phase would be
Failed
where the pod shouldn't be recreated. The comments below suggest we only added this check because we couldn't reproduce the Failed phase when restart policy is Always.
There are errors in this comments, and there was some discussion about this in the issue. I will update them then for review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code comments updated, please review. @andrewsykim
c29b6d7
to
0d39b40
Compare
if isRestartPolicyAlways && !isPodEvicted { | ||
// A Pod with `RestartPolicy: Always` will guarantee that the status transition | ||
// from `Running` to `Failed / Succeeded` and back to `Running`. | ||
// However, if the Non-Controller Managed Pod(e.g., head pod) is evicted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does non-controller managed Pod mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does non-controller managed Pod mean?
"non-controller managed Pod" refers to Pods that are directly created by users without being managed by any Kubernetes controllers (such as Deployments, ReplicaSets, StatefulSets, etc.).
For example, If pod is managed by ReplicaSets, when it is evicted, it will be re-scheduled
ray-operator/controllers/ray/raycluster_controller_unit_test.go
Outdated
Show resolved
Hide resolved
0d39b40
to
d66b3ce
Compare
Is this PR ready for review? It is still marked as a draft. |
marked ready for review |
isRestartPolicyAlways := pod.Spec.RestartPolicy == corev1.RestartPolicyAlways | ||
isPodEvicted := pod.Status.Reason == utils.PodEvictedReason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure pod eviction is the only case where pod phase is Failed? Should we future proof this logic for future cases and always delete pod when phase is Failed?
func shouldDeletePod(pod corev1.Pod, nodeType rayv1.RayNodeType) (bool, string) {
// If the Pod's status is `Failed` or `Succeeded`, the Pod will not restart and we can safely delete it.
if pod.Status.Phase == corev1.PodFailed || pod.Status.Phase == corev1.PodSucceeded {
reason := fmt.Sprintf(
"The %s Pod %s status is %s which is a terminal state "+
"KubeRay will delete the Pod and create new Pods in the next reconciliation if necessary.",
nodeType, pod.Name, pod.Status.Phase)
return true, reason
}
...
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eviction is not the only reason for phase Failed
. Currently managed pods will briefly fail and quickly transition to running (restart policy Always) and we no need to delete it, but if Eviction they will not and keep Failed
.
So handle this scene separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently managed pods will briefly fail and quickly transition to running (restart policy Always)
The existing comments here imply that this is not the case:
kuberay/ray-operator/controllers/ray/raycluster_controller.go
Lines 838 to 841 in 8e5f88c
// Based on my observation, a Pod with `RestartPolicy: Always` will never be in the terminated states (i.e., `Failed` or `Succeeded`). | |
// However, I couldn't find any well-defined behavior in the Kubernetes documentation, so I can't guarantee that the status transition | |
// from `Running` to `Failed / Succeeded` and back to `Running` won't occur when we kill the main process (i.e., `ray start` in KubeRay) | |
// in the head Pod. Therefore, I've added this check as a safeguard. |
Have you tested when restart policy is Always and the reason is not Evicted, that the phase temporarily changes to Failed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your question. I have re examined and found that my previous understanding may have been incorrect. This is the logic I found about pod status updates from the k8s source code, and I found that if the restart strategy remains always, its status remains running
https://github.com/kubernetes/kubernetes/blob/46aa8959a0659e22c924bb52b38385d441715b2b/pkg/kubelet/kubelet_pods.go#L1556
For your scenario, there should not be such a situation, so it cannot be reproduced.
However, from Failed to Running, there is indeed a possibility of human intervention in pod configuration, node resource recovery, etc. Therefore, is it necessary to handle this concurrent scenario: when a node fails, it also needs to query the latest node status and perform a deletion operation
d66b3ce
to
ffac9ae
Compare
Signed-off-by: Goober <chenhao86899@gmail.com>
ffac9ae
to
0a463a2
Compare
Based on this logic, I update the logic of the
|
I will revisit this PR this week. I hope to include this PR in v1.2.0. I will cut the branch next week. |
cc @MortalHappiness would you mind reviewing this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I tried to run your PR with the steps in #2125 (comment) and it can successfully recreate the head pod after the taint node.kubernetes.io/memory-pressure:NoSchedule-
is removed.
Here are some suggestions.
@@ -931,8 +931,10 @@ func TestReconcile_PodEvicted_DiffLess0_OK(t *testing.T) { | |||
assert.Equal(t, len(testPods), len(podList.Items), "Init pod list len is wrong") | |||
|
|||
// Simulate head pod get evicted. | |||
podList.Items[0].Spec.RestartPolicy = corev1.RestartPolicyAlways |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add tests for RestartPolicyOnFailure
.
// The restart policy is `Always` and the Pod is in a terminate state. | ||
// The expected behavior is that the controller will not delete the Pod because | ||
// the restart policy is `Always`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments here should be updated. The controller will delete the pod regardless the restart policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @MortalHappiness would you mind opening a PR to address your comments?
Note for myself:
If the restartPolicy
is Always
and eviction happened, the Pod status will be PodFailed
.
// the Pod and rely on the Pod's restart policy to restart the Pod. | ||
isRestartPolicyAlways := pod.Spec.RestartPolicy == corev1.RestartPolicyAlways | ||
// Based on the logic of the change of the status of the K8S pod, the following judgment is made. | ||
// https://github.com/kubernetes/kubernetes/blob/3361895612dac57044d5dacc029d2ace1865479c/pkg/kubelet/kubelet_pods.go#L1556 | ||
|
||
// If the Pod's status is `Failed` or `Succeeded`, the Pod will not restart and we can safely delete it. | ||
if pod.Status.Phase == corev1.PodFailed || pod.Status.Phase == corev1.PodSucceeded { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bit of concern about this. What is the behavior of K8s when a Pod with restartPolicy: Always
goes into a terminal state? Is it possible to restart?
If it restarts and Ray schedules a new Ray task or actor to the new Ray node, then KubeRay starts the next reconciliation and kills the Pod. Due to a delay in the informer, KubeRay still thinks the Pod is in a terminal state.
Why are these changes needed?
This PR attempts to fix issues #2125
if head pod has been evicted, we will delete it and let it restart or recreate
Related issue number
#2125
Checks