-
Notifications
You must be signed in to change notification settings - Fork 603
[Feat][RayJob] UserMode SubmissionMode #2364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat][RayJob] UserMode SubmissionMode #2364
Conversation
// Make sure the submission mode is K8sJobMode. | ||
Expect(rayJob.Spec.SubmissionMode).To(Equal(rayv1.K8sJobMode)) | ||
var _ = Context("RayJob with different submission modes", func() { | ||
Context("RayJob in K8sJobMode", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewers: Please note that all the code in Context("RayJob in K8sJobMode")
is simply indented. It may appear as though there are many changes, but in reality, nothing has actually changed.
It("Make RayCluster.Status.State to be rayv1.Ready (attempt 2)", func() { | ||
// The RayCluster is not 'Ready' yet because Pods are not running and ready. | ||
Expect(rayCluster.Status.State).NotTo(Equal(rayv1.Ready)) //nolint:staticcheck // https://github.com/ray-project/kuberay/pull/2288 | ||
Context("RayJob in NoneMode", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reviewers: The real change is that I added Context("RayJob in NoneMode")
, starting from here.
@kevin85421 PTAL |
e36ec2d
to
c0e1e56
Compare
cc @andrewsykim |
cc @chiayi for the CLI part |
59cd4f6
to
a111d85
Compare
a111d85
to
3288ae7
Compare
35197ba
to
d0a1dab
Compare
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
d0a1dab
to
d0be059
Compare
Can you update the PR title? |
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
12351a2
to
b4ff999
Compare
if rayJobInstance.Spec.SubmissionMode == rayv1.K8sJobMode { | ||
if err := r.createK8sJobIfNeed(ctx, rayJobInstance, rayClusterInstance); err != nil { | ||
return ctrl.Result{RequeueAfter: RayJobDefaultRequeueDuration}, err | ||
} | ||
} | ||
|
||
logger.Info("Both RayCluster and the submitter K8s Job are created. Transition the status from `Initializing` to `Running`.", | ||
logger.Info("SubmissionMode is K8sJobMode and both RayCluster and the submitter K8s Job are created. Transition the status from `Initializing` to `Running`.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for HTTPMode going through here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The alternative value is "HTTPMode", indicating that KubeRay will submit the Ray job by sending an HTTP request to the RayCluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I tested it by setting breakpoint in IDE. Log message updated.
"RayJob", rayJobInstance.Name, "RayCluster", rayJobInstance.Status.RayClusterName) | ||
rayJobInstance.Status.JobDeploymentStatus = rayv1.JobDeploymentStatusRunning | ||
case rayv1.JobDeploymentStatusWaiting: | ||
if rayJobInstance.Spec.SubmissionMode != rayv1.UserMode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about moving the check
if err := validateRayJobSpec(rayJobInstance); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
1bb275e
to
04729ba
Compare
Hey, is this feature considered available? I can see it was included into KubeRay v1.2.2 release, but the docs haven't been updated to mention it. |
@danielgafni This mode is only used in our kubectl plugin. Normal use cases should not directly use this. |
What if I would like to have similar functionality with a different Specifically, I'm thinking about local development against I'm working on dagster-ray integration and there are a few places where it would be possible to use the As I understand, right now it should be possible to replicate what the plugin does:
Are there any reasons this can't be done? |
Sure. You can replicate what the kubectl plugin does manually. @kevin85421 Do you think this use case is reasonable? Should we document this mode as a public API? |
I view the kubectl plugin as just a reference client implementation for the API. It was intended for users to create their own client / SDK with it as needed. |
Having said that, I think the API can use some improvements before we make it publicly available with documentation, mainly:
|
I think adding a new field to spec is weird because spec is used to define the "desired state of the resource" but we don't know submission ID in advance. But anyway I agreed that this API needs more discussion and improvement before publicly document it. |
The field in spec is exactly for defining the "desired state", which is the pre-defined submission ID that a user has to use for job submission. It would simplify Daniel's step from #2364 (comment) to the following:
I also think that something like |
We already released 1.2.2 with the API so I'd be worried about changing it at this point and breaking people, see #2418 |
Hi @danielgafni, we will make it a public API in the next release. However, we may introduce some breaking changes. If it's not urgent, you can wait for the next release.
@MortalHappiness the use case makes sense to me, but I think we should wait until the API is more stable before documenting it.
I think it is fine. It is an alpha API, and currently only kubectl plugin should use it. |
Thanks, I'm fine with trying this feature in experimental setting. |
@andrewsykim Will this approach work for running interactive ray jobs in client mode? I would like the I don't know if a predefined job_id can be used with The other way would definitely work tho - it's trivial to first run |
@danielgafni you can submit a job and specfiy |
Not really what I'm looking for, I was hoping it would be possible to just call
Right now it isn't but don't you think the possibility of changing that would be quite intriguing? It seems like this workflow should be very possible (with the annotation approach). |
If you can specify and override the submission ID for an interactive session with ray.init(), then I think it's possible. I don't recall if that's possible. |
I don't think so. Perhaps introducing this option to |
Hey guys, is this functionality considered public & stable at this point? I want to implement the following in
|
Why are these changes needed?
This PR implements the
None
SubmissionMode in this design doc. But after discussion we decided to rename it toUserMode
.Related issue number
N/A
Checks