-
Notifications
You must be signed in to change notification settings - Fork 616
Open
Labels
Description
Search before asking
- I had searched in the issues and found no similar feature requirement.
Description
Implement a lightweight job submitter that provides the same interface as ray job submit. However, it has no Ray dependencies and instead calls the Ray dashboard's RESTful API. This has two benefits:
-
This allows the K8s job submitter to avoid pulling the Ray image, which is typically over 1 GB even in its thinnest version without ML libraries. This will enhance the startup time of RayJob.
-
We can implement our retry logic if there are network issues between the K8s Job submitter and Ray head to avoid [Bug] RayJob falsely marked as "Running" when driver fails #2154.
We attempted to upstream some changes to Ray but encountered pushback, so KubeRay should consider implementing the solution independently.
Use case
No response
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!