Skip to content

Conversation

ryanaoleary
Copy link
Contributor

Why are these changes needed?

The label selector API can now fully replace NodeAffinitySchedulingStrategy when soft=False by specifying a label selector requirement for ray.io/node-id, a default node label which is always added and used by the aforementioned scheduling strategy. This PR replaces uses of NodeAffinitySchedulingStrategy in Ray libraries and tests as part of the implementation plan for Ray Label Selector API proposal. We will continue to replace redundant functionality of NodeAffinitySchedulingStrategy and NodeLabelSchedulingStrategy as fallback label selectors are implemented.

Related issue number

#51564

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ryanaoleary ryanaoleary requested review from a team as code owners July 26, 2025 10:10
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ryanaoleary, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request is a significant step in migrating Ray's internal and external APIs from the older NodeAffinitySchedulingStrategy to the more flexible and standardized label_selector API. The primary goal is to ensure that hard node affinity (where soft=False) is expressed uniformly using node labels, improving consistency and paving the way for future scheduling enhancements.

Highlights

  • API Migration: I've replaced all instances of NodeAffinitySchedulingStrategy (specifically when soft=False) with the new label_selector API across various Ray libraries and tests. This aligns with the ongoing effort to standardize scheduling mechanisms.
  • Code Simplification: The changes involve removing direct imports and instantiations of NodeAffinitySchedulingStrategy, leading to cleaner and more concise code where explicit node affinity is required.
  • Standardized Node Affinity: Node-specific scheduling now consistently leverages the ray.io/node-id label within the label_selector dictionary, as per the new Label Selector API proposal.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great step towards unifying scheduling strategies under the new Label Selector API. The changes are mostly a straightforward replacement of NodeAffinitySchedulingStrategy with label_selector.

I've reviewed the changes and found a few critical issues that need to be addressed:

  1. In python/ray/dashboard/modules/job/job_manager.py, the _get_label_selector function returns a nested dictionary, which will cause incorrect behavior when used.
  2. In python/ray/data/_internal/stats.py, there's a potential NameError as label_selector is not defined in all code paths.
  3. In release/long_running_tests/workloads/many_drivers.py, there are syntax errors in an f-string due to extra parentheses.

These issues could lead to runtime failures. Please see my detailed comments for suggestions on how to fix them. Once these are addressed, the PR should be in good shape.

@ryanaoleary
Copy link
Contributor Author

cc: @MengjinYan I replaced the usages in a bunch of files, this PR is quite large (although the changes are almost identical everywhere and pretty straightforward) so I can break it up if necessary. I'll put out a follow-up PR for NodeLabelSchedulingStrategy where appropriate.

ryanaoleary and others added 4 commits July 26, 2025 03:14
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
@ray-gardener ray-gardener bot added community-contribution Contributed by the community core Issues that should be addressed in Ray Core labels Jul 26, 2025
@edoakes edoakes added the go add ONLY when ready to merge, run all tests label Jul 30, 2025
@edoakes
Copy link
Collaborator

edoakes commented Jul 30, 2025

@ryanaoleary looks like there might be some CI failures. I added go label to kick off full run

Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Aug 14, 2025
@edoakes
Copy link
Collaborator

edoakes commented Aug 14, 2025

not stale

@github-actions github-actions bot added unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it. and removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Aug 14, 2025
@ryanaoleary
Copy link
Contributor Author

This PR is blocked because we need to handle the special case where if a node id unavailable, we should directly return an infeasible error rather than block waiting to schedule. I'll work on that fix for label scheduling and then this PR should pass CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution Contributed by the community core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests unstale A PR that has been marked unstale. It will not get marked stale again if this label is on it.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants