Skip to content

[Core] Enhancing node affinity scheduling feature through node labels #34894

@larrylian

Description

@larrylian

Description

This feature improves node affinity scheduling by allowing the addition of static labels to nodes, which are then used to determine affinity.
Ray Enhancement Proposals: ray-project/enhancements#22

To help track the progress of this feature's development, I have subdivided it into several items. These items will be subject to modification based on actual development circumstances, and any suggestions for improvement are welcome.

1. API for Node Affinity Scheduling with Labels

API for setting node labels
API for using node labels
API for getting node labels
  • (P1)Finalize the API for getting node labels in Python.
  • (P3)Finalize the API for getting node labels in Ray Dashboard.
  • (P4)Finalize the API for getting node labels in Ray command-line(ray status).

2. Internal Implementation

3. Tests

  • Implement basic test cases for Python.
  • Add test cases for edge scenarios.
  • Add test cases for various failover/abnormal scenarios.
  • Add test cases for cross-language calls.

4. Adapting Java and C++ workers

  • (P3)Implement the node affinity with labels interface in Java and transparently transmit it to the CoreWorker.
  • (P3)Add test cases for the Java worker implementation.
  • (P4)Implement the node affinity with labels interface in C++ and transparently transmit it to the CoreWorker.
  • (P4)Add test cases for the C++ worker implementation.

5. Adapting Auto Scaling

  • (P4)Add node labels information and node affinity with labels scheduling information to the API for AutoScaler and GCS interactions.
  • (P4)Adapt the logic of the simulated scheduling module in the Autoscaler to implement node affinity scheduling with labels.

6. Visualization/Observable

  • (P3)Display the labels information of nodes in the Ray dashboard.

7. Document

  • (P5)Write documentation for using node affinity scheduling with labels.

Metadata

Metadata

Labels

P2Important issue, but not time-criticalcoreIssues that should be addressed in Ray Corecore-schedulerenhancementRequest for new feature and/or capability

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions