Skip to content

support scheduler plugins #3612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

KunWuLuan
Copy link
Contributor

@KunWuLuan KunWuLuan commented May 16, 2025

Support users to set coscheduling of scheduler-plugins as kuberay's gang scheduler:
#3611

Follow the following introduction to install kube-scheduler and enable coscheduling plugin:
https://github.com/kubernetes-sigs/scheduler-plugins/blob/master/doc/install.md

After the installation of coscheduling plugin, users can use ray.io/gang-scheduling-enabled: "true" to
enable gang scheduling for raycluster.
Example:

apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: test-podgroup-0
  labels:
    ray.io/gang-scheduling-enabled: "true"
spec:
  rayVersion: "2.41.0"
  headGroupSpec:
    rayStartParams: {}
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.41.0
          resources:
            limits:
              cpu: "1"
              memory: "2Gi"
            requests:
              cpu: "1"
              memory: "2Gi"
  workerGroupSpecs:
  - groupName: worker
    rayStartParams: {}
    replicas: 2
    minReplicas: 2
    maxReplicas: 2
    template:
      spec:
        containers:
        - name: ray-head
          image: rayproject/ray:2.41.0
          resources:
            limits:
              cpu: "1"
              memory: "1Gi"
            requests:
              cpu: "1"
              memory: "1Gi"

You should see the following podgroup created by kuberay and labels added to the pods:

> k get podgroups.scheduling.x-k8s.io
NAME              PHASE        MINMEMBER   RUNNING   SUCCEEDED   FAILED   AGE
test-podgroup-0   Scheduling   3                                          28s
> k get po -L scheduling.x-k8s.io/pod-group
NAME                                  READY   STATUS    RESTARTS   AGE     POD-GROUP
test-podgroup-0-head                  1/1     Running   0          2m27s   test-podgroup-0
test-podgroup-0-worker-worker-dkwvl   1/1     Running   0          2m27s   test-podgroup-0
test-podgroup-0-worker-worker-znrbj   1/1     Running   0          2m27s   test-podgroup-0

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@KunWuLuan KunWuLuan changed the title support scheduler plugins (WIP)support scheduler plugins May 16, 2025
@KunWuLuan KunWuLuan force-pushed the feat/support-scheduler-plugins branch 6 times, most recently from 55d376d to a054ab9 Compare May 21, 2025 04:03
@KunWuLuan KunWuLuan changed the title (WIP)support scheduler plugins support scheduler plugins May 27, 2025
@KunWuLuan
Copy link
Contributor Author

@kevin85421 Hi, Please have a look when you have time. Thanks!

@kevin85421
Copy link
Member

@KunWuLuan can you resolve the conflict?

Copy link
Member

@MortalHappiness MortalHappiness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure your PR can run successfully. I tried to run your PR but it failed to create PodGroup.

You also need to update the following places

  • # Enable customized Kubernetes scheduler integration. If enabled, Ray workloads will be scheduled
    # by the customized scheduler.
    # * "enabled" is the legacy option and will be deprecated soon.
    # * "name" is the standard option, expecting a scheduler name, supported values are
    # "default", "volcano", and "yunikorn".
    #
    # Note: "enabled" and "name" should not be set at the same time. If both are set, an error will be thrown.
    #
    # Examples:
    # 1. Use volcano (deprecated)
    # batchScheduler:
    # enabled: true
    #
    # 2. Use volcano
    # batchScheduler:
    # name: volcano
    #
    # 3. Use yunikorn
    # batchScheduler:
    # name: yunikorn
    #
    batchScheduler:
    # Deprecated. This option will be removed in the future.
    # Note, for backwards compatibility. When it sets to true, it enables volcano scheduler integration.
    enabled: false
    # Set the customized scheduler name, supported values are "volcano" or "yunikorn", do not set
    # "batchScheduler.enabled=true" at the same time as it will override this option.
    name: ""
  • if config.BatchScheduler == volcano.GetPluginName() || config.BatchScheduler == yunikorn.GetPluginName() {
  • Add a sample config named ray-cluster.kube-scheduler.yaml to ray-operator/config/samples/ folder

Follow-up:

@KunWuLuan
Copy link
Contributor Author

Hi, I have update the helm chart and make sure it works in my local environment. Please have a look when you have time.Thanks
@MortalHappiness

@KunWuLuan KunWuLuan force-pushed the feat/support-scheduler-plugins branch from 34d3f44 to a8048b0 Compare May 29, 2025 15:15
metadata:
name: test-podgroup-0
labels:
ray.io/gang-scheduling-enabled: "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be using ray.io/* labels for any of the scheduler integrations. The label prefix should be specific to the integration (see other examples with Volcano, Yunikorn and Kueue)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, ray.io/gang-scheduling-enabled is used in the sample of yunikorn, and ray.io/scheduler-name: volcano is used when we need to use volcano.
And these labels are defined here:

RaySchedulerName = "ray.io/scheduler-name"
RayPriorityClassName = "ray.io/priority-class-name"
RayClusterGangSchedulingEnabled = "ray.io/gang-scheduling-enabled"

We didn't create a new label for new scheduler.


// AddMetadataToPod adds essential labels and annotations to the Ray pods
// the scheduler needs these labels and annotations in order to do the scheduling properly
func (k *KubeScheduler) AddMetadataToPod(_ context.Context, app *rayv1.RayCluster, groupName string, pod *corev1.Pod) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/app/cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain the comment? Thank you!

// when gang scheduling is enabled, extra annotations need to be added to all pods
if k.isGangSchedulingEnabled(app) {
// the group name for the head and each of the worker group should be different
pod.Labels[KubeSchedulerPodGroupLabelKey] = app.Name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should PodGroups be scheduled at the worker group level or RayCluster level? I feel like worker group level could make more sense in some cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the head is not available, the workers will be blocked, this is unacceptable for my customers. So I think the head should be checked with the workers.

@MortalHappiness
Copy link
Member

MortalHappiness commented May 30, 2025

Please also resolve conflicts. We now use single go.mod in the repo root. Thanks.

Copy link
Member

@MortalHappiness MortalHappiness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to add permission for creating PodGroup here

@KunWuLuan KunWuLuan force-pushed the feat/support-scheduler-plugins branch from a8048b0 to 21f4f88 Compare May 30, 2025 08:59
Comment on lines 79 to 95
func TestCalculateDesiredResources(t *testing.T) {
a := assert.New(t)

cluster := createTestRayCluster(1)

totalResource := utils.CalculateDesiredResources(&cluster)

// 256m * 3 (requests, not limits)
a.Equal("768m", totalResource.Cpu().String())

// 256Mi * 3 (requests, not limits)
a.Equal("768Mi", totalResource.Memory().String())

// 2 GPUs total
a.Equal("2", totalResource.Name("nvidia.com/gpu", resource.BinarySI).String())
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you testing here? It seems that you didn't test anything specific to the kube scheduler plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only calculate the min-resource and create the podgroup. The functionality of creating the podgroup is completed by the client. So in unit test we just need to test the logic of calculate the min-resource.

Copy link
Member

@MortalHappiness MortalHappiness Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you refer to this test and this function, extract the create PodGroup logic into a separate function, and then test whether the returned PodGroup spec is correct in the unit test?

  • func TestCreatePodGroup(t *testing.T) {
    a := assert.New(t)
    cluster := createTestRayCluster(1)
    minMember := utils.CalculateDesiredReplicas(context.Background(), &cluster) + 1
    totalResource := utils.CalculateDesiredResources(&cluster)
    pg := createPodGroup(&cluster, getAppPodGroupName(&cluster), minMember, totalResource)
    a.Equal(cluster.Namespace, pg.Namespace)
    // 1 head + 2 workers (desired, not min replicas)
    a.Equal(int32(3), pg.Spec.MinMember)
    // 256m * 3 (requests, not limits)
    a.Equal("768m", pg.Spec.MinResources.Cpu().String())
    // 256Mi * 3 (requests, not limits)
    a.Equal("768Mi", pg.Spec.MinResources.Memory().String())
    // 2 GPUs total
    a.Equal("2", pg.Spec.MinResources.Name("nvidia.com/gpu", resource.BinarySI).String())
    }
  • func createPodGroup(
    app *rayv1.RayCluster,
    podGroupName string,
    size int32,
    totalResource corev1.ResourceList,
    ) v1beta1.PodGroup {
    podGroup := v1beta1.PodGroup{
    ObjectMeta: metav1.ObjectMeta{
    Namespace: app.Namespace,
    Name: podGroupName,
    OwnerReferences: []metav1.OwnerReference{
    *metav1.NewControllerRef(app, rayv1.SchemeGroupVersion.WithKind("RayCluster")),
    },
    },
    Spec: v1beta1.PodGroupSpec{
    MinMember: size,
    MinResources: &totalResource,
    },
    Status: v1beta1.PodGroupStatus{
    Phase: v1beta1.PodGroupPending,
    },
    }
    if queue, ok := app.ObjectMeta.Labels[QueueNameLabelKey]; ok {
    podGroup.Spec.Queue = queue
    }
    if priorityClassName, ok := app.ObjectMeta.Labels[utils.RayPriorityClassName]; ok {
    podGroup.Spec.PriorityClassName = priorityClassName
    }
    return podGroup
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@MortalHappiness
Copy link
Member

Please also install pre-commit locally and then fix the CI lint error. Thanks.

@kevin85421
Copy link
Member

I discussed this with @MortalHappiness offline. Since this PR still requires some work before it can be merged, I’ve removed the release-blocker label. We can consider cherry-pick this PR after branch cut.

@KunWuLuan KunWuLuan force-pushed the feat/support-scheduler-plugins branch 3 times, most recently from 7c572f7 to 1e40ceb Compare May 31, 2025 06:31
Signed-off-by: kunwuluan <kunwuluan@gmail.com>
KunWuLuan added 2 commits June 3, 2025 12:02
update ValidateBatchSchedulerConfig()

update helm chart

Rename the function.

Signed-off-by: kunwuluan <kunwuluan@gmail.com>
… start the operator.

Signed-off-by: kunwuluan <kunwuluan@gmail.com>
@KunWuLuan KunWuLuan force-pushed the feat/support-scheduler-plugins branch 2 times, most recently from 7809c0c to 505402b Compare June 3, 2025 05:37
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
@KunWuLuan KunWuLuan force-pushed the feat/support-scheduler-plugins branch from 505402b to 21baca2 Compare June 3, 2025 05:48
@kevin85421
Copy link
Member

@MortalHappiness I chatted with @KunWuLuan offline. We will try to include this into 1.4.0. This is not branch cut blocker. Please review this PR.

@MortalHappiness
Copy link
Member

Overall LGTM. But I don't know what of this test does as I mentioned above #3612 (comment)

}

func (kf *KubeSchedulerFactory) ConfigureReconciler(b *builder.Builder) *builder.Builder {
return b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we configure reconciler so that the informer cache has the information of PodGroup?

}
}

scheme := runtime.NewScheme()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse the controller's informer cache? If this is non-trivial, we don't need to address this in this PR.

TODO: Volcano seems to call K8s API server for each reconciliation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a new issue to track this.

@KunWuLuan KunWuLuan force-pushed the feat/support-scheduler-plugins branch from 5b51e56 to b8d029c Compare June 5, 2025 03:14
yueming.wk added 3 commits June 6, 2025 09:29
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
@KunWuLuan KunWuLuan force-pushed the feat/support-scheduler-plugins branch from b8d029c to f5fc6de Compare June 6, 2025 01:31
@MortalHappiness
Copy link
Member

Hi @KunWuLuan Could you resolve the conflicts. Thanks!

Copy link
Member

@MortalHappiness MortalHappiness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

#3612 (comment) can be addressed in follow-up PRs.

…duler-plugins

Signed-off-by: kaihsun <kaihsun@anyscale.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
@kevin85421
Copy link
Member

@MortalHappiness I added some TODOs and did some refactoring. Would you mind reviewing another pass?

Manually test:
Screenshot 2025-06-12 at 1 56 01 AM

@kevin85421 kevin85421 merged commit e80e84b into ray-project:master Jun 12, 2025
25 checks passed
kevin85421 added a commit to kevin85421/kuberay that referenced this pull request Jun 12, 2025
* support scheduler plugins

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* add unit test

update ValidateBatchSchedulerConfig()

update helm chart

Rename the function.

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* Update the role in helm chart. And ensure the crd is installed before start the operator.

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* Fix CI lint

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Modify ray version in raycluster sample.

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Update the scheduler name

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Update the test

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* update and add TODOs

Signed-off-by: kaihsun <kaihsun@anyscale.com>

---------

Signed-off-by: kunwuluan <kunwuluan@gmail.com>
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
Co-authored-by: yueming.wk <yueming.wk@alibaba-inc.com>
Co-authored-by: kaihsun <kaihsun@anyscale.com>
@kevin85421 kevin85421 mentioned this pull request Jun 12, 2025
2 tasks
MortalHappiness pushed a commit that referenced this pull request Jun 13, 2025
* support scheduler plugins



* add unit test

update ValidateBatchSchedulerConfig()

update helm chart

Rename the function.



* Update the role in helm chart. And ensure the crd is installed before start the operator.



* Fix CI lint



* Modify ray version in raycluster sample.



* Update the scheduler name



* Update the test



* update and add TODOs



---------

Signed-off-by: kunwuluan <kunwuluan@gmail.com>
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
Co-authored-by: GreenHand <kunwuluan@gmail.com>
Co-authored-by: yueming.wk <yueming.wk@alibaba-inc.com>
chipspeak pushed a commit to chipspeak/kuberay that referenced this pull request Jul 2, 2025
* support scheduler plugins

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* add unit test

update ValidateBatchSchedulerConfig()

update helm chart

Rename the function.

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* Update the role in helm chart. And ensure the crd is installed before start the operator.

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* Fix CI lint

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Modify ray version in raycluster sample.

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Update the scheduler name

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Update the test

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* update and add TODOs

Signed-off-by: kaihsun <kaihsun@anyscale.com>

---------

Signed-off-by: kunwuluan <kunwuluan@gmail.com>
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
Co-authored-by: yueming.wk <yueming.wk@alibaba-inc.com>
Co-authored-by: kaihsun <kaihsun@anyscale.com>
kryanbeane pushed a commit to kryanbeane/kuberay that referenced this pull request Jul 2, 2025
* support scheduler plugins

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* add unit test

update ValidateBatchSchedulerConfig()

update helm chart

Rename the function.

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* Update the role in helm chart. And ensure the crd is installed before start the operator.

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* Fix CI lint

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Modify ray version in raycluster sample.

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Update the scheduler name

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Update the test

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* update and add TODOs

Signed-off-by: kaihsun <kaihsun@anyscale.com>

---------

Signed-off-by: kunwuluan <kunwuluan@gmail.com>
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
Co-authored-by: yueming.wk <yueming.wk@alibaba-inc.com>
Co-authored-by: kaihsun <kaihsun@anyscale.com>
DW-Han pushed a commit to DW-Han/kuberay that referenced this pull request Jul 30, 2025
* support scheduler plugins

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* add unit test

update ValidateBatchSchedulerConfig()

update helm chart

Rename the function.

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* Update the role in helm chart. And ensure the crd is installed before start the operator.

Signed-off-by: kunwuluan <kunwuluan@gmail.com>

* Fix CI lint

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Modify ray version in raycluster sample.

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Update the scheduler name

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* Update the test

Signed-off-by: KunWuLuan <kunwuluan@gmail.com>

* update and add TODOs

Signed-off-by: kaihsun <kaihsun@anyscale.com>

---------

Signed-off-by: kunwuluan <kunwuluan@gmail.com>
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
Co-authored-by: yueming.wk <yueming.wk@alibaba-inc.com>
Co-authored-by: kaihsun <kaihsun@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants