Skip to content

[Feat][RayJob] UserMode SubmissionMode #2364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

MortalHappiness
Copy link
Member

@MortalHappiness MortalHappiness commented Sep 8, 2024

Why are these changes needed?

This PR implements the None SubmissionMode in this design doc. But after discussion we decided to rename it to UserMode.

Related issue number

N/A

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

// Make sure the submission mode is K8sJobMode.
Expect(rayJob.Spec.SubmissionMode).To(Equal(rayv1.K8sJobMode))
var _ = Context("RayJob with different submission modes", func() {
Context("RayJob in K8sJobMode", func() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: Please note that all the code in Context("RayJob in K8sJobMode") is simply indented. It may appear as though there are many changes, but in reality, nothing has actually changed.

It("Make RayCluster.Status.State to be rayv1.Ready (attempt 2)", func() {
// The RayCluster is not 'Ready' yet because Pods are not running and ready.
Expect(rayCluster.Status.State).NotTo(Equal(rayv1.Ready)) //nolint:staticcheck // https://github.com/ray-project/kuberay/pull/2288
Context("RayJob in NoneMode", func() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: The real change is that I added Context("RayJob in NoneMode"), starting from here.

@MortalHappiness
Copy link
Member Author

@kevin85421 PTAL

@MortalHappiness MortalHappiness force-pushed the feat/rayjob-none-submission-mode branch 3 times, most recently from e36ec2d to c0e1e56 Compare September 8, 2024 11:50
@kevin85421 kevin85421 self-assigned this Sep 8, 2024
@kevin85421
Copy link
Member

cc @andrewsykim

@andrewsykim
Copy link
Member

cc @chiayi for the CLI part

@MortalHappiness MortalHappiness marked this pull request as draft September 9, 2024 15:04
@MortalHappiness MortalHappiness force-pushed the feat/rayjob-none-submission-mode branch from 59cd4f6 to a111d85 Compare September 9, 2024 15:06
@MortalHappiness MortalHappiness marked this pull request as ready for review September 9, 2024 15:09
@MortalHappiness MortalHappiness force-pushed the feat/rayjob-none-submission-mode branch from a111d85 to 3288ae7 Compare September 10, 2024 12:30
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
@MortalHappiness MortalHappiness force-pushed the feat/rayjob-none-submission-mode branch from d0a1dab to d0be059 Compare September 12, 2024 08:45
@andrewsykim
Copy link
Member

Can you update the PR title?

@MortalHappiness MortalHappiness changed the title [Feat][RayJob] NoneMode SubmissionMode [Feat][RayJob] UserMode SubmissionMode Sep 12, 2024
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
@MortalHappiness MortalHappiness force-pushed the feat/rayjob-none-submission-mode branch from 12351a2 to b4ff999 Compare September 12, 2024 14:45
if rayJobInstance.Spec.SubmissionMode == rayv1.K8sJobMode {
if err := r.createK8sJobIfNeed(ctx, rayJobInstance, rayClusterInstance); err != nil {
return ctrl.Result{RequeueAfter: RayJobDefaultRequeueDuration}, err
}
}

logger.Info("Both RayCluster and the submitter K8s Job are created. Transition the status from `Initializing` to `Running`.",
logger.Info("SubmissionMode is K8sJobMode and both RayCluster and the submitter K8s Job are created. Transition the status from `Initializing` to `Running`.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for HTTPMode going through here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# The alternative value is "HTTPMode", indicating that KubeRay will submit the Ray job by sending an HTTP request to the RayCluster.

Copy link
Member Author

@MortalHappiness MortalHappiness Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I tested it by setting breakpoint in IDE. Log message updated.

"RayJob", rayJobInstance.Name, "RayCluster", rayJobInstance.Status.RayClusterName)
rayJobInstance.Status.JobDeploymentStatus = rayv1.JobDeploymentStatusRunning
case rayv1.JobDeploymentStatusWaiting:
if rayJobInstance.Spec.SubmissionMode != rayv1.UserMode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about moving the check

if err := validateRayJobSpec(rayJobInstance); err != nil {
above?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
@MortalHappiness MortalHappiness force-pushed the feat/rayjob-none-submission-mode branch from 1bb275e to 04729ba Compare September 17, 2024 11:12
@kevin85421 kevin85421 merged commit cf4a877 into ray-project:master Sep 17, 2024
27 checks passed
@danielgafni
Copy link

Hey, is this feature considered available?

I can see it was included into KubeRay v1.2.2 release, but the docs haven't been updated to mention it.

@MortalHappiness
Copy link
Member Author

@danielgafni This mode is only used in our kubectl plugin. Normal use cases should not directly use this.

@danielgafni
Copy link

danielgafni commented Oct 9, 2024

What if I would like to have similar functionality with a different KubeRay interface (without using the kubectl plugin)?

Specifically, I'm thinking about local development against KubeRay with Dagster.

I'm working on dagster-ray integration and there are a few places where it would be possible to use the UserMode submission mode in order to quickly launch pipelines without building images.

As I understand, right now it should be possible to replicate what the plugin does:

  • Create a RayJob CR with submissionMode=UserMode
  • Use JobSubmissionClient with working_dir set in runtime_env to submit a job with some local files
  • Add the ray.io/ray-job-submission-id annotation to the RayJob CR

Are there any reasons this can't be done?

@MortalHappiness
Copy link
Member Author

Sure. You can replicate what the kubectl plugin does manually.

@kevin85421 Do you think this use case is reasonable? Should we document this mode as a public API?

@andrewsykim
Copy link
Member

I view the kubectl plugin as just a reference client implementation for the API. It was intended for users to create their own client / SDK with it as needed.

@andrewsykim
Copy link
Member

Having said that, I think the API can use some improvements before we make it publicly available with documentation, mainly:

  1. I think InteractiveMode is a better name than UserMode.
  2. Instead of an annotation for the submission ID, add a new field spec.submissionId

@MortalHappiness
Copy link
Member Author

I think adding a new field to spec is weird because spec is used to define the "desired state of the resource" but we don't know submission ID in advance. But anyway I agreed that this API needs more discussion and improvement before publicly document it.

@andrewsykim
Copy link
Member

andrewsykim commented Oct 9, 2024

I think adding a new field to spec is weird because spec is used to define the "desired state of the resource" but we don't know submission ID in advance.

The field in spec is exactly for defining the "desired state", which is the pre-defined submission ID that a user has to use for job submission. It would simplify Daniel's step from #2364 (comment) to the following:

  1. Create a RayJob CR with submissionMode=UserMode and spec.submissionId="some-id"
  2. Use JobSubmissionClient with working_dir set in runtime_env to submit a job with some local files, using submission ID from 1)

I also think that something like spec.submissionId can be used for other submission modes to "hard code" the job to use a well-defined submission ID that is easier to search in logs

@andrewsykim
Copy link
Member

We already released 1.2.2 with the API so I'd be worried about changing it at this point and breaking people, see #2418

@kevin85421
Copy link
Member

Hi @danielgafni, we will make it a public API in the next release. However, we may introduce some breaking changes. If it's not urgent, you can wait for the next release.

Do you think this use case is reasonable? Should we document this mode as a public API?

@MortalHappiness the use case makes sense to me, but I think we should wait until the API is more stable before documenting it.

I'd be worried about changing it at this point and breaking people

I think it is fine. It is an alpha API, and currently only kubectl plugin should use it.

@danielgafni
Copy link

Thanks, I'm fine with trying this feature in experimental setting.

@danielgafni
Copy link

danielgafni commented Oct 14, 2024

The field in spec is exactly for defining the "desired state", which is the pre-defined submission ID

@andrewsykim Will this approach work for running interactive ray jobs in client mode? I would like the RayJob CR to manage the cluster lifecycle but also to be able to connect in client mode.

I don't know if a predefined job_id can be used with ray.init.

The other way would definitely work tho - it's trivial to first run ray.init and then add an annotation with the generated job id.

@andrewsykim
Copy link
Member

@danielgafni you can submit a job and specfiy --submission-id with ray job submit where the python script calls ray.init(). But RayJob itself is not really meant to be used with interactive client like you would with RayCluster

@danielgafni
Copy link

you can submit a job and specfiy --submission-id with ray job submit where the python script calls ray.init()

Not really what I'm looking for, I was hoping it would be possible to just call ray.init from Python code without breaking out of the main code flow.

RayJob itself is not really meant to be used with interactive client

Right now it isn't but don't you think the possibility of changing that would be quite intriguing?

It seems like this workflow should be very possible (with the annotation approach).

@andrewsykim
Copy link
Member

If you can specify and override the submission ID for an interactive session with ray.init(), then I think it's possible. I don't recall if that's possible.

@danielgafni
Copy link

I don't think so. Perhaps introducing this option to ray.init would be another path towards the interactive RayJob workflow.

@danielgafni
Copy link

Hey guys, is this functionality considered public & stable at this point? I want to implement the following in dagster-ray:

  1. Create a RayJob CR with submissionMode=InteractiveMode
  2. Use JobSubmissionClient to submit a job with some local files. Alternatively, use ray.init.
  3. Add the ray.io/ray-job-submission-id annotation to the RayJob CR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants