Add default Ray node label info to Ray Pod environment #3699

ryanaoleary · 2025-05-27T20:32:59Z

Add default Ray node label info to Ray Pod environment

Why are these changes needed?

This PR adds market-type information for different cloud providers to the Ray pod environment based on the provided nodeSelector value. This PR also adds environment variables to pass region and zone information using downward API (kubernetes/kubernetes#127092). These environment variables will be used in Ray core to set default Ray node labels.

I'll add a comment below with my manual test results with propagating topology.k8s.io/region and topology.k8s.io/zone on a GKE v1.33 alpha cluster.

Related issue number

ray-project/ray#51564

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

ryanaoleary · 2025-05-27T20:33:17Z

cc: @MengjinYan

andrewsykim · 2025-05-28T14:00:39Z

ray-operator/controllers/ray/common/pod.go

@@ -493,6 +493,9 @@ func BuildPod(ctx context.Context, podTemplateSpec corev1.PodTemplateSpec, rayNo
 		initLivenessAndReadinessProbe(&pod.Spec.Containers[utils.RayContainerIndex], rayNodeType, creatorCRDType)
 	}

+	// add downward API environment variables for Ray default node labels
+	addDefaultRayNodeLabels(&pod)


Should we guard this logic with a Ray version check?

This code doesn't rely on any API change from Ray, it just sets some env vars and the actual node labels get set in Ray core using those vars, but I can add a version guard here (I guess for whatever version ray-project/ray#53360 is included in) if we don't want it setting any unused vars for users on older versions of Ray.

From offline discussion with @MengjinYan we were leaning towards not including a version guard, since users are not required to specify the Ray version they're using in the CR spec

MengjinYan

LTGM @kevin85421 Can take a look from Kuberay's perspective?

andrewsykim · 2025-05-28T17:32:08Z

ray-operator/controllers/ray/common/pod.go

+		pod.Spec.Containers[utils.RayContainerIndex].Env,
+		// used to set the ray.io/market-type node label
+		corev1.EnvVar{
+			Name:  "RAY_NODE_MARKET_TYPE",


Is the plan for Ray Core to check these env vars? Can you link the PR that includes this change?

Yeah that's correct - this is the related PR: ray-project/ray#53360

ryanaoleary · 2025-06-04T19:09:04Z

LTGM @kevin85421 Can take a look from Kuberay's perspective?

@kevin85421 Bumping this to see if we can include it in 1.4

kevin85421 · 2025-06-04T19:44:51Z

@ryanaoleary I chatted with @MengjinYan, and my understanding is that this doesn’t need to be included in v1.4.0. Could you sync with @MengjinYan and let me know if I’m mistaken? Thanks!

ryanaoleary · 2025-06-04T20:39:56Z

ding is that this doesn’t need to be included in v1.4.0. Could you sync with @MengjinYan and let me know if I’m mistaken? Thanks!

Synced offline with @MengjinYan and yeah there's no urgency to include this in v1.4.0, we can wait to include it in the next release. My thought was just that it'd be useful to have this functionality in the soonest stable release for testing, but I can just use the nightly image.

kevin85421 · 2025-06-04T20:48:19Z

My thought was just that it'd be useful to have this functionality in the soonest stable release for testing, but I can just use the nightly image.

This makes sense. I’ll review it for now. We’ll make a best-effort attempt to include this PR, but there’s no guarantee.

ray-operator/controllers/ray/common/pod.go

kevin85421 · 2025-06-04T21:03:02Z

ray-operator/controllers/ray/common/pod.go

+
+// getRayMarketTypeFromNodeSelector is a helper function to determine the ray.io/market-type label
+// based on user-provided Kubernetes nodeSelector values.
+func getRayMarketTypeFromNodeSelector(pod *corev1.Pod) string {


I guess some schedulers or webhooks may update the node selector after the Pod is created. We should take it into consideration.

Do you mean that the value of the default labels might change after the ray node started?

might change after the ray node started?

No, I mean it may be changed after KubeRay constructs the Pod spec but before the Pod is created or scheduled.

Are there any webhooks that modify the value of cloud.google.com/gke-spot on a Pod? I'm having difficulty finding a reference. I think we should default to adding Ray node labels based on what the user specifies in their Pod spec.

ray-operator/controllers/ray/common/pod.go

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Add default Ray node label info to Ray Pod environment Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

kevin85421 · 2025-07-15T17:57:06Z

cc @andrewsykim, do you have the bandwidth to review this PR? I will give it a final pass after you approve the PR.

MengjinYan · 2025-07-23T18:06:56Z

@andrewsykim @kevin85421 Follow up on this PR.

cc: @ryanaoleary

andrewsykim · 2025-08-15T23:05:01Z

Sorry for the delayed response on this, I've been mulling over it a bit since we're early in the v1.5 cycle anyways.

My biggest question is whether we want this behavior guarded by a feature gate initially. I'm a bit hesitant about auto-populating Ray labels based on a combination of Pod labels and node selectors. It doesn't seem like the most intuitive for users and not sure we want lock ourselves if we think there's a better approach later. Mainly I'm wondering if we need to make label configuration more explicit and let users opt-in for copying a certain set of pod labels.

Just throwing a bunch of ideas out there for us to brainstorm on the right API:

Users can set arbitrary labels which provides more structure than just overriding --labels in rayStartParams using a comma de-limited list.

-workerGroupSpecs:
  - replicas: 1
    ...
    groupName: workergroup
    rayStartParams: {}
    rayletLabels:
      foo: bar
      ray.io/market-type: "spot"

Users can opt into copying all pod labels

-workerGroupSpecs:
  - replicas: 1
    ...
    groupName: workergroup
    rayStartParams: {}
    populateRayletLabelsFromPod: true

Users can opt into copying a list of labels, which can includex regex:

-workerGroupSpecs:
  - replicas: 1
    ...
    groupName: workergroup
    rayStartParams: {}
    copyPodLabelsToRaylet:
    - topology.kubernetes.io/**
    - foo.bar.io

ryanaoleary · 2025-08-28T14:17:00Z

Sorry for the delayed response on this, I've been mulling over it a bit since we're early in the v1.5 cycle anyways.

My biggest question is whether we want this behavior guarded by a feature gate initially. I'm a bit hesitant about auto-populating Ray labels based on a combination of Pod labels and node selectors. It doesn't seem like the most intuitive for users and not sure we want lock ourselves if we think there's a better approach later. Mainly I'm wondering if we need to make label configuration more explicit and let users opt-in for copying a certain set of pod labels.

Just throwing a bunch of ideas out there for us to brainstorm on the right API:

Users can set arbitrary labels which provides more structure than just overriding --labels in rayStartParams using a comma de-limited list.
-workerGroupSpecs:
  - replicas: 1
    ...
    groupName: workergroup
    rayStartParams: {}
    rayletLabels:
      foo: bar
      ray.io/market-type: "spot" 
Users can opt into copying all pod labels
-workerGroupSpecs:
  - replicas: 1
    ...
    groupName: workergroup
    rayStartParams: {}
    populateRayletLabelsFromPod: true
Users can opt into copying a list of labels, which can includex regex:
-workerGroupSpecs:
  - replicas: 1
    ...
    groupName: workergroup
    rayStartParams: {}
    copyPodLabelsToRaylet:
    - topology.kubernetes.io/**
    - foo.bar.io

For the first option, I think rayletLabels does look cleaner to me but that functionality is already fully supported just through the comma delineated --labels list in rayStartParams. So yeah I think this could be good to implement just for user clarity, but it's not currently blocking anything.

For the second and third options, I think we could combine the API fields by just adding copyPodLabelsToRaylet and then accepting - * as an argument to copy all labels. I'd also be fine with implementing both fields separately since they'd use the same functionality. What do you think about the above @MengjinYan?

For this PR though, I think a benefit of automatically populating these default variables using downward API is that it lessens the manual intervention required by the user for the default Raylet labels to get set. I'm also not clear on how a user would explicitly pass a field set using downward API. I think a feature gate is a good idea though to avoid adding unused fields for some users, I'll add a features.SetDefaultLabelVars flag around all the code blocks.

kevin85421

I am wondering whether we should do this or not. @MengjinYan, @rueian, and I had some discussions about Ray Autoscaler.

Ray Autoscaler can’t know which zone/region a new Ray Pod will be in, because the zone/region is only determined once the Ray Pod is running.
I do see the value of this PR, but not a lot, because
- (1) Most use cases that require ZONE / MARKET_TYPE / REGION also require autoscaling.
- (2) UX fragmentations: Users can use ZONE / MARKET_TYPE / REGION to schedule Ray tasks, but can't use the information to autoscale.
To address (1) and (2), users still need to manually specify the information, so that's why I am concerned whether this PR actually adds values or not.

I will request changes to prevent this PR from being merged accidentally, since Mengjin had already approved it before we realized this issue.

Maybe we should first work on a doc to educate users on how to configure K8s node pools, RayCluster PodSpec, and rayStartParams to enable label-based scheduling, before adding complexity to KubeRay without very strong use cases.

Feel free to let me know if you have a strong evidence for the use cases of this PR.

ryanaoleary · 2025-08-29T11:58:32Z

Ray Autoscaler can’t know which zone/region a new Ray Pod will be in, because the zone/region is only determined once the Ray Pod is running.

This is true for the env vars set using downward API, but I think for the case where the user specifies a nodeSelector, we can guarantee that the node will run in the specified region/zone/market-type if it scales up. If I add the labels that we set here to the available node types in the autoscaling config, then I think autoscaling fully works here.

I think the current behavior of label selectors with this PR is as follows:

The Ray Autoscaler will scale nodes based on only the resources (not labels) requested by tasks/actors.
Nodes will be scaled with those resources and the default Raylet labels will get set based on downward API
The cluster resource scheduler will attempt to schedule using the label_selector in the decorator of the Task/Actor. If the label selector does not match the default labels that get set for that node, it will just remain pending. However, if the node scaled to satisfy the resource request is of the desired market type, region, or zone requested by the label selector, then it will schedule and run.

I think the above flow would be the current behavior with this PR. While it's not ideal since users can't directly autoscale based on these labels, they can still use autoscaling and constrain where their code runs based on the label selectors. Like you mentioned, users can also edit their K8s Pod spec or manually set the labels themselves for more control. I think the main benefit of going forward with the above behavior is the following case:

A user defines a RayCluster with multiple worker groups with similar resources, but with different region, zone, or market-type labels and node selectors.
The user scales up several replicas of the worker groups based on some resource requests.
The user wants to constrain certain parts of their application code to only run on replicas of a specific worker group of X region, zone, or market-type.

What do you think @MengjinYan? I think the next step for me to get this merge ready will be to work on a user guide for K8s like @kevin85421 mentioned and clarify the e2e autoscaling use case, I'll try to put this out by next week. I'll also fix the merge conflict and add the feature gate.

ryanaoleary self-assigned this May 27, 2025

MengjinYan self-assigned this May 28, 2025

ryanaoleary mentioned this pull request May 28, 2025

[Core] Add default Ray Node labels at Node init ray-project/ray#53360

Merged

8 tasks

andrewsykim reviewed May 28, 2025

View reviewed changes

MengjinYan approved these changes May 28, 2025

View reviewed changes

andrewsykim reviewed May 28, 2025

View reviewed changes

ryanaoleary requested a review from andrewsykim June 3, 2025 22:03

kevin85421 reviewed Jun 4, 2025

View reviewed changes

masoudcharkhabi added the k8s-proj label Jun 5, 2025

masoudcharkhabi added this to K8s and Ray (go/k8s-ray-oss) Jun 5, 2025

masoudcharkhabi moved this to In Progress in K8s and Ray (go/k8s-ray-oss) Jun 5, 2025

ryanaoleary requested a review from kevin85421 June 5, 2025 00:51

ryanaoleary mentioned this pull request Jun 5, 2025

[Core] Ray Label Selector API Implementation Tracker ray-project/ray#51564

Open

23 tasks

andrewsykim reviewed Jun 5, 2025

View reviewed changes

ray-operator/controllers/ray/common/pod.go Outdated Show resolved Hide resolved

ryanaoleary and others added 7 commits June 6, 2025 05:55

Add default Ray node label info to Ray Pod environment

0d0e462

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Add default Ray node label info to Ray Pod environment Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Use existing check container env helper

af9700b

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Remove var that's already set

1f87e2f

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Update ray-operator/controllers/ray/common/pod.go

5e135d8

Co-authored-by: Kai-Hsun Chen <kaihsun@apache.org> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>

Fix comments

b55a223

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Check for nodeAffinity

e9bd3e1

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Check Pod Labels first and add unit tests

604c162

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

ryanaoleary force-pushed the pass-worker-group-ray-label branch from daebac4 to 604c162 Compare June 6, 2025 05:56

ryanaoleary requested a review from andrewsykim June 6, 2025 06:46

kevin85421 requested changes Aug 28, 2025

View reviewed changes

Add default Ray node label info to Ray Pod environment #3699

Are you sure you want to change the base?

Add default Ray node label info to Ray Pod environment #3699

Uh oh!

Conversation

ryanaoleary commented May 27, 2025

Why are these changes needed?

Related issue number

Checks

Uh oh!

ryanaoleary commented May 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MengjinYan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanaoleary commented Jun 4, 2025

Uh oh!

kevin85421 commented Jun 4, 2025

Uh oh!

ryanaoleary commented Jun 4, 2025

Uh oh!

kevin85421 commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevin85421 commented Jul 15, 2025

Uh oh!

MengjinYan commented Jul 23, 2025

Uh oh!

andrewsykim commented Aug 15, 2025

Uh oh!

ryanaoleary commented Aug 28, 2025

Uh oh!

kevin85421 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanaoleary commented Aug 29, 2025

Uh oh!

Uh oh!

kevin85421 left a comment •

edited

Loading