Skip to content

Conversation

larrylian
Copy link
Contributor

@larrylian larrylian commented Jun 14, 2023

Why are these changes needed?

After our offline discussion, we have come up with the following plan. This API plan will review and finalization by more people in the future.

Complete form of API:

MyActor.options(
        scheduling_strategy=NodeLabelSchedulingStrategy(
            [
                {
                    "region": IN("us"), 
                    "gpu_type": IN("A100")
                },
                {
                    "region": IN("asia", "europe"), 
                    "gpu_type": IN("T100")
                 },
            ]
        )
).remote()

User also can use more simple api:

 MyActor.options(
        scheduling_strategy=NodeLabelSchedulingStrategy(
                {
                    "region": IN("us"), 
                    "gpu_type": IN("A100")
                }
)).remote()

Related issue number

[Core][Labels Scheduling]Finalize the new node affinity scheduling with node labels API in the Python worker #36419

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@larrylian larrylian self-assigned this Jun 14, 2023
@larrylian larrylian force-pushed the labels_scheduling_1 branch from e96d370 to 8e53f5c Compare June 25, 2023 09:36
@larrylian larrylian changed the title [WIP][Core][Labels Scheduling]Add node affinity with labels API in python [Core][Label Scheduling 1/n]Add NodeLabelSchedulingStrategy API in python Jun 25, 2023
@larrylian larrylian assigned jjyao and unassigned larrylian Jun 25, 2023
@larrylian larrylian force-pushed the labels_scheduling_1 branch from 8e53f5c to b2b6d13 Compare June 25, 2023 11:35
@larrylian larrylian force-pushed the labels_scheduling_1 branch from b2b6d13 to 8c79eda Compare July 9, 2023 15:02
LabelMatchExpressionsT = Dict[str, LabelMatchExpression]


@PublicAPI(stability="beta")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start with alpha.



@PublicAPI(stability="beta")
class NodeLabelSchedulingStrategy:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this file into scheduling_strategies.py? All the other strategies are defined there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@larrylian larrylian force-pushed the labels_scheduling_1 branch from 8c79eda to 134e094 Compare July 10, 2023 11:55
@larrylian larrylian requested a review from jjyao July 10, 2023 11:55
@larrylian larrylian force-pushed the labels_scheduling_1 branch 2 times, most recently from a3625e3 to db02d91 Compare July 10, 2023 11:58
self.soft = transfer_map_to_expresions(soft)


def transfer_map_to_expresions(map_expressions: LabelMatchExpressionsT):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_convert_map_to_label_math_expressions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change to _convert_map_to_expressions. more concise.

Comment on lines 145 to 147
"The value of label match expression in node label "
"scheduling strategy must be created by "
"`IN` or `NOT_IN` or`EXIST` or `DOES_NOT_EXIST`."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The value of label match expression in node label "
"scheduling strategy must be created by "
"`IN` or `NOT_IN` or`EXIST` or `DOES_NOT_EXIST`."
"The value of label match expression in node label "
"scheduling strategy must be "
"`In` or `NotIn` or`Exists` or `DoesNotExist` operator."

@larrylian larrylian force-pushed the labels_scheduling_1 branch from db02d91 to 0aa8344 Compare July 11, 2023 03:04
pass


class LabelMatchExpression:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class LabelMatchExpression:
class _LabelMatchExpression:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 123 to 142
"""The schedling strategy of node Affinity with labels
Simplified usage:
scheduling_strategy=NodeLabelSchedulingStrategy({
"region": IN("us"),
"gpu_type": EXISTS()
})
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""The schedling strategy of node Affinity with labels
Simplified usage:
scheduling_strategy=NodeLabelSchedulingStrategy({
"region": IN("us"),
"gpu_type": EXISTS()
})
"""
"""Label based node affinity scheduling strategy
scheduling_strategy=NodeLabelSchedulingStrategy({
"region": IN("us"),
"gpu_type": EXISTS()
})
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fxied

@@ -72,9 +72,90 @@ def __init__(
self._fail_on_unavailable = _fail_on_unavailable


def _validate_values(values):
if values is None or len(values) == 0:
raise ValueError("The value of lable cannot be empty.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lable -> label


for value in values:
if not isinstance(value, str):
raise ValueError("The value of lable must be string.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be a little more specific here like Expected value {value} in position {index} to be of type str

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@pcmoritz
Copy link
Contributor

Before this can be merged, the error messages should be improved:

  1. First, see my comments
  2. But then, they should also tell the user which key is associated to the value that has the error, and whether it is the "soft" branch

Think about the user experience: They will get the error, and they will need to find out exactly which value caused the error. Currently that is extremely hard at best.

@larrylian larrylian force-pushed the labels_scheduling_1 branch from 93e0a61 to 9b850dd Compare July 14, 2023 01:51
@larrylian larrylian requested review from pcmoritz and jjyao July 14, 2023 01:57
@larrylian
Copy link
Contributor Author

Before this can be merged, the error messages should be improved:

  1. First, see my comments
  2. But then, they should also tell the user which key is associated to the value that has the error, and whether it is the "soft" branch

I have fixed. Can you help me review again?

@@ -72,9 +72,95 @@ def __init__(
self._fail_on_unavailable = _fail_on_unavailable


def _validate_values(values):
if values is None or len(values) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if values is None or len(values) == 0:
if not values:

@@ -72,9 +72,95 @@ def __init__(
self._fail_on_unavailable = _fail_on_unavailable


def _validate_values(values):
if values is None or len(values) == 0:
raise ValueError("The value of label cannot be empty.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variadic parameter of the In operator must be a non-empty tuple: e.g. In("value1", "value2").

@larrylian larrylian force-pushed the labels_scheduling_1 branch 2 times, most recently from 3edceb9 to 8bb6b7c Compare July 14, 2023 08:21
…thon

Signed-off-by: LarryLian <554538252@qq.com>
@larrylian larrylian force-pushed the labels_scheduling_1 branch from 8bb6b7c to 6071810 Compare July 14, 2023 15:46
Copy link
Contributor

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjyao The ideal error message would also include which label the error message is referring (e.g. "gpu_type" in the example) :)

@jjyao
Copy link
Collaborator

jjyao commented Jul 18, 2023

The ideal error message would also include which label the error message is referring (e.g. "gpu_type" in the example) :)

Discussed offline: currently the validation is done inside the operator constructor and we don't have the label key information. User should be able to rely on the stack trace to locate which operator is invalid without knowing the label key.

@jjyao jjyao merged commit 13a5391 into ray-project:master Jul 18, 2023
Bhav00 pushed a commit to Bhav00/ray that referenced this pull request Jul 28, 2023
NripeshN pushed a commit to NripeshN/ray that referenced this pull request Aug 15, 2023
…thon (ray-project#36418)

Signed-off-by: LarryLian <554538252@qq.com>
Signed-off-by: NripeshN <nn2012@hw.ac.uk>
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…thon (ray-project#36418)

Signed-off-by: LarryLian <554538252@qq.com>
Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
…thon (ray-project#36418)

Signed-off-by: LarryLian <554538252@qq.com>
Signed-off-by: Victor <vctr.y.m@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants