-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Open
Labels
P2Important issue, but not time-criticalImportant issue, but not time-criticalcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray Corecore-schedulerenhancementRequest for new feature and/or capabilityRequest for new feature and/or capability
Description
Description
This feature improves node affinity scheduling by allowing the addition of static labels to nodes, which are then used to determine affinity.
Ray Enhancement Proposals: ray-project/enhancements#22
To help track the progress of this feature's development, I have subdivided it into several items. These items will be subject to modification based on actual development circumstances, and any suggestions for improvement are welcome.
1. API for Node Affinity Scheduling with Labels
API for setting node labels
- (P1)Finalize the command-line API(ray start) for setting node labels. [Core][Node Labels 1/n] Add --labels param in ray start command to setting node labels #35433
- (P1)Finalize the format for setting node labels in the ray.init() interface of Python worker. [Core][Node Labels 2/n] Add labels param in python ray.init() to setting node labels #36007
- (P3)Finalize the format for setting node labels in the ray.init() interface of Java and C++ worker.
- (P4)Finalize the ray up API for setting node labels.
- (P4)Finalize the kuberay API for setting node labels.
API for using node labels
- (P2)Finalize the new node affinity scheduling with node labels API in the Python worker. [Core][Label Scheduling 1/n]Add NodeLabelSchedulingStrategy API in python #36418
- (P3)Finalize the new node affinity scheduling with node labels API in the Java and C++ worker.
API for getting node labels
- (P1)Finalize the API for getting node labels in Python.
- (P3)Finalize the API for getting node labels in Ray Dashboard.
- (P4)Finalize the API for getting node labels in Ray command-line(ray status).
2. Internal Implementation
- (P1)Parse the configuration parameters for node labels and save them in the NodeInfo data structure. [Core][Node Labels 1/n] Add --labels param in ray start command to setting node labels #35433
- (P1)Finalize default node labels.
- (P1)Synchronize the node labels information to the resources of all nodes. [Core][Node Labels 3/n]Add node labels to node resources and publish to all node #36009
- Delete - (P2)Build an index table based on the labels information of all nodes to improve scheduling performance.
- (P2)Implement the node affinity with labels interface in Python and transparently transmit it to the CoreWorker. [Core][Label scheduling 3/n] Implement node label scheduling strategy #37339
- (P2)Implement the node affinity with labels scheduling policy. [Core][Label scheduling 3/n] Implement node label scheduling strategy #37339
3. Tests
- Implement basic test cases for Python.
- Add test cases for edge scenarios.
- Add test cases for various failover/abnormal scenarios.
- Add test cases for cross-language calls.
4. Adapting Java and C++ workers
- (P3)Implement the node affinity with labels interface in Java and transparently transmit it to the CoreWorker.
- (P3)Add test cases for the Java worker implementation.
- (P4)Implement the node affinity with labels interface in C++ and transparently transmit it to the CoreWorker.
- (P4)Add test cases for the C++ worker implementation.
5. Adapting Auto Scaling
- (P4)Add node labels information and node affinity with labels scheduling information to the API for AutoScaler and GCS interactions.
- (P4)Adapt the logic of the simulated scheduling module in the Autoscaler to implement node affinity scheduling with labels.
6. Visualization/Observable
- (P3)Display the labels information of nodes in the Ray dashboard.
7. Document
- (P5)Write documentation for using node affinity scheduling with labels.
scv119 and jjyao
Metadata
Metadata
Labels
P2Important issue, but not time-criticalImportant issue, but not time-criticalcoreIssues that should be addressed in Ray CoreIssues that should be addressed in Ray Corecore-schedulerenhancementRequest for new feature and/or capabilityRequest for new feature and/or capability