Skip to content

Conversation

ryanaoleary
Copy link
Contributor

@ryanaoleary ryanaoleary commented Apr 2, 2025

Why are these changes needed?

This PR updates the cluster resource scheduling logic to check whether an eligible node satisfies the given label match expressions when checking if a node IsSchedulable. This PR also adds the label_selector option to Task/Actor creation, adds logic to parse strings to the LabelSelector data structure in the raylet, and passes the label_selector to the core worker to be used when building the TaskSpec.

These changes are to support the label selector API to ensure tasks/actors execute on nodes with the required node labels.

Related issue number

#51564

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Fix types

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Fix proto

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Update proto naming to match autoscaler

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Fix errors

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Remove gcs proto

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Fix header file

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Change expression to constraint

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Format

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary force-pushed the schedule-using-labels branch from 87451d8 to 2e93936 Compare April 2, 2025 10:37
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Apr 2, 2025
@hainesmichaelc hainesmichaelc added the community-contribution Contributed by the community label Apr 4, 2025
ryanaoleary and others added 16 commits April 8, 2025 06:31
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>

Add label_selector to common task spec

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: ryanaoleary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary marked this pull request as ready for review April 15, 2025 13:18
@ryanaoleary ryanaoleary requested review from pcmoritz, raulchen and a team as code owners April 15, 2025 13:18
ryanaoleary and others added 8 commits May 1, 2025 21:38
… label_selector API

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: Dhyey Shah <dhyey2019@gmail.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary requested a review from dayshah May 2, 2025 02:12
ryanaoleary and others added 4 commits May 3, 2025 00:22
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@edoakes edoakes merged commit f6e9ca2 into ray-project:master May 6, 2025
5 checks passed
edoakes pushed a commit that referenced this pull request Aug 4, 2025
…pare_label_selector` (#52964)

This PR is a follow-up to this comment:
#51901 (comment).
This PR changes the cluster resource scheduler to propagate a Ray status
to `ComputeResources` in `TaskSpecification` when the LabelSelector data
type is initialized. This allows a task built with a malformed label
selector to return an error as a more useful Python exception rather
than crashing Ray components in the C++.

#51564

---------

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
elliot-barn pushed a commit that referenced this pull request Aug 4, 2025
…pare_label_selector` (#52964)

This PR is a follow-up to this comment:
#51901 (comment).
This PR changes the cluster resource scheduler to propagate a Ray status
to `ComputeResources` in `TaskSpecification` when the LabelSelector data
type is initialized. This allows a task built with a malformed label
selector to return an error as a more useful Python exception rather
than crashing Ray components in the C++.

#51564

---------

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
elliot-barn pushed a commit that referenced this pull request Aug 4, 2025
…pare_label_selector` (#52964)

This PR is a follow-up to this comment:
#51901 (comment).
This PR changes the cluster resource scheduler to propagate a Ray status
to `ComputeResources` in `TaskSpecification` when the LabelSelector data
type is initialized. This allows a task built with a malformed label
selector to return an error as a more useful Python exception rather
than crashing Ray components in the C++.

#51564

---------

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
kamil-kaczmarek pushed a commit that referenced this pull request Aug 4, 2025
…pare_label_selector` (#52964)

This PR is a follow-up to this comment:
#51901 (comment).
This PR changes the cluster resource scheduler to propagate a Ray status
to `ComputeResources` in `TaskSpecification` when the LabelSelector data
type is initialized. This allows a task built with a malformed label
selector to return an error as a more useful Python exception rather
than crashing Ray components in the C++.

#51564

---------

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Kamil Kaczmarek <kamil@anyscale.com>
mjacar pushed a commit to mjacar/ray that referenced this pull request Aug 5, 2025
…pare_label_selector` (ray-project#52964)

This PR is a follow-up to this comment:
ray-project#51901 (comment).
This PR changes the cluster resource scheduler to propagate a Ray status
to `ComputeResources` in `TaskSpecification` when the LabelSelector data
type is initialized. This allows a task built with a malformed label
selector to return an error as a more useful Python exception rather
than crashing Ray components in the C++.

ray-project#51564

---------

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Michael Acar <michael.j.acar@gmail.com>
elliot-barn pushed a commit that referenced this pull request Aug 5, 2025
…pare_label_selector` (#52964)

This PR is a follow-up to this comment:
#51901 (comment).
This PR changes the cluster resource scheduler to propagate a Ray status
to `ComputeResources` in `TaskSpecification` when the LabelSelector data
type is initialized. This allows a task built with a malformed label
selector to return an error as a more useful Python exception rather
than crashing Ray components in the C++.

#51564

---------

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
sampan-s-nayak pushed a commit that referenced this pull request Aug 12, 2025
…pare_label_selector` (#52964)

This PR is a follow-up to this comment:
#51901 (comment).
This PR changes the cluster resource scheduler to propagate a Ray status
to `ComputeResources` in `TaskSpecification` when the LabelSelector data
type is initialized. This allows a task built with a malformed label
selector to return an error as a more useful Python exception rather
than crashing Ray components in the C++.

#51564

---------

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: sampan <sampan@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-backlog community-contribution Contributed by the community core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants