-
Notifications
You must be signed in to change notification settings - Fork 604
support scheduler plugins #3612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support scheduler plugins #3612
Conversation
55d376d
to
a054ab9
Compare
@kevin85421 Hi, Please have a look when you have time. Thanks! |
@KunWuLuan can you resolve the conflict? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure your PR can run successfully. I tried to run your PR but it failed to create PodGroup
.
You also need to update the following places
kuberay/helm-chart/kuberay-operator/values.yaml
Lines 49 to 76 in 75ea7ae
# Enable customized Kubernetes scheduler integration. If enabled, Ray workloads will be scheduled # by the customized scheduler. # * "enabled" is the legacy option and will be deprecated soon. # * "name" is the standard option, expecting a scheduler name, supported values are # "default", "volcano", and "yunikorn". # # Note: "enabled" and "name" should not be set at the same time. If both are set, an error will be thrown. # # Examples: # 1. Use volcano (deprecated) # batchScheduler: # enabled: true # # 2. Use volcano # batchScheduler: # name: volcano # # 3. Use yunikorn # batchScheduler: # name: yunikorn # batchScheduler: # Deprecated. This option will be removed in the future. # Note, for backwards compatibility. When it sets to true, it enables volcano scheduler integration. enabled: false # Set the customized scheduler name, supported values are "volcano" or "yunikorn", do not set # "batchScheduler.enabled=true" at the same time as it will override this option. name: "" if config.BatchScheduler == volcano.GetPluginName() || config.BatchScheduler == yunikorn.GetPluginName() { - Add a sample config named
ray-cluster.kube-scheduler.yaml
toray-operator/config/samples/
folder
Follow-up:
ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go
Outdated
Show resolved
Hide resolved
Hi, I have update the helm chart and make sure it works in my local environment. Please have a look when you have time.Thanks |
34d3f44
to
a8048b0
Compare
metadata: | ||
name: test-podgroup-0 | ||
labels: | ||
ray.io/gang-scheduling-enabled: "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should be using ray.io/* labels for any of the scheduler integrations. The label prefix should be specific to the integration (see other examples with Volcano, Yunikorn and Kueue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, ray.io/gang-scheduling-enabled
is used in the sample of yunikorn, and ray.io/scheduler-name: volcano
is used when we need to use volcano.
And these labels are defined here:
kuberay/ray-operator/controllers/ray/utils/constant.go
Lines 35 to 37 in 75a63a5
RaySchedulerName = "ray.io/scheduler-name" | |
RayPriorityClassName = "ray.io/priority-class-name" | |
RayClusterGangSchedulingEnabled = "ray.io/gang-scheduling-enabled" |
We didn't create a new label for new scheduler.
ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go
Show resolved
Hide resolved
ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go
Outdated
Show resolved
Hide resolved
|
||
// AddMetadataToPod adds essential labels and annotations to the Ray pods | ||
// the scheduler needs these labels and annotations in order to do the scheduling properly | ||
func (k *KubeScheduler) AddMetadataToPod(_ context.Context, app *rayv1.RayCluster, groupName string, pod *corev1.Pod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/app/cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain the comment? Thank you!
// when gang scheduling is enabled, extra annotations need to be added to all pods | ||
if k.isGangSchedulingEnabled(app) { | ||
// the group name for the head and each of the worker group should be different | ||
pod.Labels[KubeSchedulerPodGroupLabelKey] = app.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should PodGroups be scheduled at the worker group level or RayCluster level? I feel like worker group level could make more sense in some cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the head is not available, the workers will be blocked, this is unacceptable for my customers. So I think the head should be checked with the workers.
Please also resolve conflicts. We now use single |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to add permission for creating PodGroup
here
rules: |
a8048b0
to
21f4f88
Compare
func TestCalculateDesiredResources(t *testing.T) { | ||
a := assert.New(t) | ||
|
||
cluster := createTestRayCluster(1) | ||
|
||
totalResource := utils.CalculateDesiredResources(&cluster) | ||
|
||
// 256m * 3 (requests, not limits) | ||
a.Equal("768m", totalResource.Cpu().String()) | ||
|
||
// 256Mi * 3 (requests, not limits) | ||
a.Equal("768Mi", totalResource.Memory().String()) | ||
|
||
// 2 GPUs total | ||
a.Equal("2", totalResource.Name("nvidia.com/gpu", resource.BinarySI).String()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you testing here? It seems that you didn't test anything specific to the kube scheduler plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we only calculate the min-resource and create the podgroup. The functionality of creating the podgroup is completed by the client. So in unit test we just need to test the logic of calculate the min-resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you refer to this test and this function, extract the create PodGroup logic into a separate function, and then test whether the returned PodGroup spec is correct in the unit test?
kuberay/ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler_test.go
Lines 81 to 103 in 2f2c1a2
func TestCreatePodGroup(t *testing.T) { a := assert.New(t) cluster := createTestRayCluster(1) minMember := utils.CalculateDesiredReplicas(context.Background(), &cluster) + 1 totalResource := utils.CalculateDesiredResources(&cluster) pg := createPodGroup(&cluster, getAppPodGroupName(&cluster), minMember, totalResource) a.Equal(cluster.Namespace, pg.Namespace) // 1 head + 2 workers (desired, not min replicas) a.Equal(int32(3), pg.Spec.MinMember) // 256m * 3 (requests, not limits) a.Equal("768m", pg.Spec.MinResources.Cpu().String()) // 256Mi * 3 (requests, not limits) a.Equal("768Mi", pg.Spec.MinResources.Memory().String()) // 2 GPUs total a.Equal("2", pg.Spec.MinResources.Name("nvidia.com/gpu", resource.BinarySI).String()) } kuberay/ray-operator/controllers/ray/batchscheduler/volcano/volcano_scheduler.go
Lines 100 to 132 in a068e7b
func createPodGroup( app *rayv1.RayCluster, podGroupName string, size int32, totalResource corev1.ResourceList, ) v1beta1.PodGroup { podGroup := v1beta1.PodGroup{ ObjectMeta: metav1.ObjectMeta{ Namespace: app.Namespace, Name: podGroupName, OwnerReferences: []metav1.OwnerReference{ *metav1.NewControllerRef(app, rayv1.SchemeGroupVersion.WithKind("RayCluster")), }, }, Spec: v1beta1.PodGroupSpec{ MinMember: size, MinResources: &totalResource, }, Status: v1beta1.PodGroupStatus{ Phase: v1beta1.PodGroupPending, }, } if queue, ok := app.ObjectMeta.Labels[QueueNameLabelKey]; ok { podGroup.Spec.Queue = queue } if priorityClassName, ok := app.ObjectMeta.Labels[utils.RayPriorityClassName]; ok { podGroup.Spec.PriorityClassName = priorityClassName } return podGroup }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Please also install |
I discussed this with @MortalHappiness offline. Since this PR still requires some work before it can be merged, I’ve removed the |
7c572f7
to
1e40ceb
Compare
Signed-off-by: kunwuluan <kunwuluan@gmail.com>
update ValidateBatchSchedulerConfig() update helm chart Rename the function. Signed-off-by: kunwuluan <kunwuluan@gmail.com>
… start the operator. Signed-off-by: kunwuluan <kunwuluan@gmail.com>
7809c0c
to
505402b
Compare
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
505402b
to
21baca2
Compare
@MortalHappiness I chatted with @KunWuLuan offline. We will try to include this into 1.4.0. This is not branch cut blocker. Please review this PR. |
Overall LGTM. But I don't know what of this test does as I mentioned above #3612 (comment) |
ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go
Show resolved
Hide resolved
} | ||
|
||
func (kf *KubeSchedulerFactory) ConfigureReconciler(b *builder.Builder) *builder.Builder { | ||
return b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we configure reconciler so that the informer cache has the information of PodGroup?
ray-operator/controllers/ray/batchscheduler/scheduler-plugins/scheduler_plugins.go
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
scheme := runtime.NewScheme() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reuse the controller's informer cache? If this is non-trivial, we don't need to address this in this PR.
TODO: Volcano seems to call K8s API server for each reconciliation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create a new issue to track this.
5b51e56
to
b8d029c
Compare
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
b8d029c
to
f5fc6de
Compare
Hi @KunWuLuan Could you resolve the conflicts. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
#3612 (comment) can be addressed in follow-up PRs.
…duler-plugins Signed-off-by: kaihsun <kaihsun@anyscale.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
@MortalHappiness I added some TODOs and did some refactoring. Would you mind reviewing another pass? |
* support scheduler plugins Signed-off-by: kunwuluan <kunwuluan@gmail.com> * add unit test update ValidateBatchSchedulerConfig() update helm chart Rename the function. Signed-off-by: kunwuluan <kunwuluan@gmail.com> * Update the role in helm chart. And ensure the crd is installed before start the operator. Signed-off-by: kunwuluan <kunwuluan@gmail.com> * Fix CI lint Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Modify ray version in raycluster sample. Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Update the scheduler name Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Update the test Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * update and add TODOs Signed-off-by: kaihsun <kaihsun@anyscale.com> --------- Signed-off-by: kunwuluan <kunwuluan@gmail.com> Signed-off-by: KunWuLuan <kunwuluan@gmail.com> Signed-off-by: kaihsun <kaihsun@anyscale.com> Co-authored-by: yueming.wk <yueming.wk@alibaba-inc.com> Co-authored-by: kaihsun <kaihsun@anyscale.com>
* support scheduler plugins * add unit test update ValidateBatchSchedulerConfig() update helm chart Rename the function. * Update the role in helm chart. And ensure the crd is installed before start the operator. * Fix CI lint * Modify ray version in raycluster sample. * Update the scheduler name * Update the test * update and add TODOs --------- Signed-off-by: kunwuluan <kunwuluan@gmail.com> Signed-off-by: KunWuLuan <kunwuluan@gmail.com> Signed-off-by: kaihsun <kaihsun@anyscale.com> Co-authored-by: GreenHand <kunwuluan@gmail.com> Co-authored-by: yueming.wk <yueming.wk@alibaba-inc.com>
* support scheduler plugins Signed-off-by: kunwuluan <kunwuluan@gmail.com> * add unit test update ValidateBatchSchedulerConfig() update helm chart Rename the function. Signed-off-by: kunwuluan <kunwuluan@gmail.com> * Update the role in helm chart. And ensure the crd is installed before start the operator. Signed-off-by: kunwuluan <kunwuluan@gmail.com> * Fix CI lint Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Modify ray version in raycluster sample. Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Update the scheduler name Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Update the test Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * update and add TODOs Signed-off-by: kaihsun <kaihsun@anyscale.com> --------- Signed-off-by: kunwuluan <kunwuluan@gmail.com> Signed-off-by: KunWuLuan <kunwuluan@gmail.com> Signed-off-by: kaihsun <kaihsun@anyscale.com> Co-authored-by: yueming.wk <yueming.wk@alibaba-inc.com> Co-authored-by: kaihsun <kaihsun@anyscale.com>
* support scheduler plugins Signed-off-by: kunwuluan <kunwuluan@gmail.com> * add unit test update ValidateBatchSchedulerConfig() update helm chart Rename the function. Signed-off-by: kunwuluan <kunwuluan@gmail.com> * Update the role in helm chart. And ensure the crd is installed before start the operator. Signed-off-by: kunwuluan <kunwuluan@gmail.com> * Fix CI lint Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Modify ray version in raycluster sample. Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Update the scheduler name Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Update the test Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * update and add TODOs Signed-off-by: kaihsun <kaihsun@anyscale.com> --------- Signed-off-by: kunwuluan <kunwuluan@gmail.com> Signed-off-by: KunWuLuan <kunwuluan@gmail.com> Signed-off-by: kaihsun <kaihsun@anyscale.com> Co-authored-by: yueming.wk <yueming.wk@alibaba-inc.com> Co-authored-by: kaihsun <kaihsun@anyscale.com>
* support scheduler plugins Signed-off-by: kunwuluan <kunwuluan@gmail.com> * add unit test update ValidateBatchSchedulerConfig() update helm chart Rename the function. Signed-off-by: kunwuluan <kunwuluan@gmail.com> * Update the role in helm chart. And ensure the crd is installed before start the operator. Signed-off-by: kunwuluan <kunwuluan@gmail.com> * Fix CI lint Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Modify ray version in raycluster sample. Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Update the scheduler name Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * Update the test Signed-off-by: KunWuLuan <kunwuluan@gmail.com> * update and add TODOs Signed-off-by: kaihsun <kaihsun@anyscale.com> --------- Signed-off-by: kunwuluan <kunwuluan@gmail.com> Signed-off-by: KunWuLuan <kunwuluan@gmail.com> Signed-off-by: kaihsun <kaihsun@anyscale.com> Co-authored-by: yueming.wk <yueming.wk@alibaba-inc.com> Co-authored-by: kaihsun <kaihsun@anyscale.com>
Support users to set coscheduling of scheduler-plugins as kuberay's gang scheduler:
#3611
Follow the following introduction to install kube-scheduler and enable coscheduling plugin:
https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/doc/install.md
After the installation of coscheduling plugin, users can use
ray.io/gang-scheduling-enabled: "true"
toenable gang scheduling for raycluster.
Example:
You should see the following podgroup created by kuberay and labels added to the pods:
Checks