Skip to content

[release] Update Volcano YAML files to Ray 2.41 #2976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 7, 2025

Conversation

win5923
Copy link
Contributor

@win5923 win5923 commented Feb 7, 2025

Why are these changes needed?

Update images to Ray 2.41 and then follow https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html to check whether the doc still works or not.

1. Test ray-cluster.volcano-scheduler.yaml

$ kubectl apply -f ray-operator/config/samples/ray-cluster.volcano-scheduler.yaml 
raycluster.ray.io/test-cluster-0 created

$ kubectl get pod -l ray.io/cluster=test-cluster-0
NAME                        READY   STATUS    RESTARTS   AGE
test-cluster-0-head-9wvn7   0/1     Running   0          9s

2.Test ray-cluster.volcano-scheduler-queue.yaml

  1. Create a queue with a capacity of 4 CPUs and 6Gi of RAM:
$ kubectl create -f - <<EOF
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: kuberay-test-queue
spec:
  weight: 1
  capability:
    cpu: 4
    memory: 6Gi
EOF
  1. create a RayCluster with a head node (1 CPU + 2Gi of RAM) and two workers (1 CPU + 1Gi of RAM each), for a total of 3 CPU and 4Gi of RAM
$ kubectl apply -f ray-operator/config/samples/ray-cluster.volcano-scheduler-queue.yaml 
raycluster.ray.io/test-cluster-0 created

$ kubectl get podgroup ray-test-cluster-0-pg -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  creationTimestamp: "2025-02-07T15:13:13Z"
  generation: 5
  name: ray-test-cluster-0-pg
  namespace: default
  ownerReferences:
  - apiVersion: ray.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: RayCluster
    name: test-cluster-0
    uid: 7c2d6bd5-19da-4146-8025-dd56c5ee06fe
  resourceVersion: "1858"
  uid: 9aec52f2-6f45-4101-a0ca-5645ba0c2edf
spec:
  minMember: 3
  minResources:
    cpu: "3"
    memory: 4Gi
  queue: kuberay-test-queue
status:
  conditions:
  - lastTransitionTime: "2025-02-07T15:15:03Z"
    reason: tasks in gang are ready to be scheduled
    status: "True"
    transitionID: 3ed1d6cd-db50-405b-82a9-baed48ef3a63
    type: Scheduled
  phase: Running
  running: 3
  1. Check the status of the queue to see 1 running job:
$ kubectl get queue kuberay-test-queue -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  creationTimestamp: "2025-02-07T15:12:34Z"
  generation: 2
  name: kuberay-test-queue
  resourceVersion: "1861"
  uid: a65b1be2-a2c5-4c2c-afaa-1f088787e21d
spec:
  capability:
    cpu: 4
    memory: 6Gi
  parent: root
  reclaimable: true
  weight: 1
status:
  allocated:
    cpu: "3"
    memory: 4Gi
    pods: "3"
  reservation: {}
  state: Open
  1. add an additional RayCluster with the same configuration of head and worker nodes, but with a different name:
$ sed 's/test-cluster-0/test-cluster-1/' ray-operator/config/samples/ray-cluster.volcano
-scheduler-queue.yaml |  kubectl apply -f-
raycluster.ray.io/test-cluster-1 created

# Check the status of its PodGroup to see that its phase is Pending and the last status is Unschedulable:
$ kubectl get podgroup ray-test-cluster-1-pg -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  creationTimestamp: "2025-02-07T15:16:35Z"
  generation: 2
  name: ray-test-cluster-1-pg
  namespace: default
  ownerReferences:
  - apiVersion: ray.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: RayCluster
    name: test-cluster-1
    uid: e5008d52-eb6c-4555-9512-e81acb97ba69
  resourceVersion: "2050"
  uid: c458466a-80e1-4f8b-9ec6-65fe350bc4dc
spec:
  minMember: 3
  minResources:
    cpu: "3"
    memory: 4Gi
  queue: kuberay-test-queue
status:
  conditions:
  - lastTransitionTime: "2025-02-07T15:16:35Z"
    message: '3/3 tasks in gang unschedulable: pod group is not ready, 3 Pending,
      3 minAvailable; Pending: 3 Unschedulable'
    reason: NotEnoughResources
    status: "True"
    transitionID: f282a2e0-f113-4346-89cf-6944c0eb12bc
    type: Unschedulable
  phase: Pending

$ kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
kuberay-operator-685d587695-9xq5l    1/1     Running   0          5m8s
test-cluster-0-head-kdcbz            1/1     Running   0          4m14s
test-cluster-0-worker-worker-b7524   1/1     Running   0          4m14s
test-cluster-0-worker-worker-qcptz   1/1     Running   0          4m14s
test-cluster-1-head-vjpr7            0/1     Pending   0          52s
test-cluster-1-worker-worker-7fdmk   0/1     Pending   0          52s
test-cluster-1-worker-worker-kjsdc   0/1     Pending   0          52s
  1. Delete the first RayCluster to make space in the queue:
$ kubectl delete raycluster test-cluster-0
raycluster.ray.io "test-cluster-0" deleted

$ kubectl get podgroup ray-test-cluster-1-pg -o yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  creationTimestamp: "2025-02-07T15:16:35Z"
  generation: 5
  name: ray-test-cluster-1-pg
  namespace: default
  ownerReferences:
  - apiVersion: ray.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: RayCluster
    name: test-cluster-1
    uid: e5008d52-eb6c-4555-9512-e81acb97ba69
  resourceVersion: "2293"
  uid: c458466a-80e1-4f8b-9ec6-65fe350bc4dc
spec:
  minMember: 3
  minResources:
    cpu: "3"
    memory: 4Gi
  queue: kuberay-test-queue
status:
  conditions:
  - lastTransitionTime: "2025-02-07T15:16:35Z"
    message: '3/3 tasks in gang unschedulable: pod group is not ready, 3 Pending,
      3 minAvailable; Pending: 3 Unschedulable'
    reason: NotEnoughResources
    status: "True"
    transitionID: f282a2e0-f113-4346-89cf-6944c0eb12bc
    type: Unschedulable
  - lastTransitionTime: "2025-02-07T15:17:59Z"
    reason: tasks in gang are ready to be scheduled
    status: "True"
    transitionID: d35adc53-e669-4ec1-8566-02d56ed79692
    type: Scheduled
  phase: Running
  running: 3

$ kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
kuberay-operator-685d587695-9xq5l    1/1     Running   0          5m58s
test-cluster-1-head-vjpr7            1/1     Running   0          102s
test-cluster-1-worker-worker-7fdmk   1/1     Running   0          102s
test-cluster-1-worker-worker-kjsdc   1/1     Running   0          102s

Related issue number

Closes #2967

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Signed-off-by: win5923 <ken89@kimo.com>
@win5923 win5923 changed the title [release] Update YuniKorn YAML files to Ray 2.41 [release] Update Vocalno YAML files to Ray 2.41 Feb 7, 2025
@win5923 win5923 changed the title [release] Update Vocalno YAML files to Ray 2.41 [release] Update Volcano YAML files to Ray 2.41 Feb 7, 2025
@kevin85421 kevin85421 merged commit 391099a into ray-project:master Feb 7, 2025
20 checks passed
@win5923 win5923 deleted the volcano-2.41 branch February 8, 2025 00:30
andrewsykim added a commit that referenced this pull request Feb 10, 2025
* [RayService] More envtests that follow the most common scenario in the RayService code path (#2880)

Signed-off-by: Rueian <rueiancsie@gmail.com>

* [RayService] Remove outdated env tests (#2886)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [RayService] Refactor envtests (#2888)

* [docs][ray-operator] fix typo in Golang version (#2893)

Project uses Golang 1.22 not 1.20.

Signed-off-by: David Xia <david@davidxia.com>

* [Fix][kubectl-plugin] Fix no context nil error SIGSEGV in tests (#2892)

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [CI] Enable testifylint rule (#2896)

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [CI] Enable `testifylint` `error-nil` rule (#2907)

* [kubectl-plugin][feat] support specifying number of head GPUs (#2895)

when creating a RayCluster with `kubectl ray create cluster NAME --head-gpu N`.
Similar to the `--worker-gpu` switch.

Signed-off-by: David Xia <david@davidxia.com>

* Use webhook.CustomValidator instead of deprecated webhook.Validator. (#2803)

* Use webhook.CustomValidator instead of deprecated webhook.Validator.

* Move RayClusterWebhook to pkg/webhook/v1.

* [kubectl-plugin] update context error messages (#2891)

to tell user they can use `--context` to set the K8s context.

Add tests

Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [Fix][kubectl-plugin] make tests use a temporary kube config (#2894)

file that's not at the default path. Right now tests fail if there's a K8s
current context set with a command like `kubectl config use-context
my-context`.

This change allows tests to pass regardless of current context.

Signed-off-by: David Xia <david@davidxia.com>

* [CI] Enable testifylint formatter rule (#2915)

Signed-off-by: 400Ping <43886578+400Ping@users.noreply.github.com>

* [CI] Enable `testifylint` `empty` rule (#2908)

* [CI] Enable testifylint bool-compare rule (#2911)

Signed-off-by: 400Ping <43886578+400Ping@users.noreply.github.com>

* [CI] Auto download golang tools in pre-commit (#2917)

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [RayService] a safeguard for preventing overriding the pending cluster during a upgrade (#2887)

Signed-off-by: Rueian <rueiancsie@gmail.com>

* [RayService] Refactor unit tests for ShouldPrepareNewCluster (#2928)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [chore][kubectl-plugin] use consistent capitalization (#2922)

in comments and console messages.

* capitalize "Ray" when used as a proper noun
* capitalize Ray K8s CRDs like "RayJob" and "RayCluster"
* capitalize acronyms like "YAML"
* fix some minor typos

No functional changes.

Signed-off-by: David Xia <david@davidxia.com>

* [CI] Enable `testifylint` `require-error` rule (#2909)

* [kubectl-plugin] support general `kubectl` switches like `--context` (#2883)

* [CI] Fix lint error (require-error) (#2931)

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [CI] Enable `testifylint` `float-compare` rule (#2910)

* [docs] move pre-commit instructions to main dev docs (#2921)

* [CI] Enable `testifylint` `expected-actual` rule (#2914)

* [CI] Generate CRD json schema separately in pre-commit (#2930)

* [release][1/N] Update YAMLs from Ray 2.9 to Ray 2.41 (#2934)

* Delete `[raycluster|rayjob|rayservice]_types_test.go` unnecessary tests (#2935)

* [release][2/N] Update RayCluster Helm chart from Ray 2.9 to Ray 2.41 (#2936)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [release][3/N] Update RayService e2e tests YAML files from Ray 2.9 to Ray 2.41 (#2937)

* [release][4/N] Update Ray images / versions in kubectl plugin (#2938)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [release][5/N] Update some RayJob YAMLs from Ray 2.9 to Ray 2.41 (#2941)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [release][6/N] Remove unnecessary YAMLs (#2946)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [docs][kubectl-plugin] add dev docs (#2912)

Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [CI] Add shellcheck and fix error of it (#2933)

* [chore][kubectl-plugin] use better test assertions (#2955)

Fix two places.

* use `assert.EqualError()` instead of `assert.Error()` to check the error
  string is what we expect
* use testify assertion instead of string checking for better error output

Signed-off-by: David Xia <david@davidxia.com>

* [chore] add Markdown linting pre-commit hook (#2953)

Ignore most rules we violate for now.
Fix these two violations.

```
pre-commit run markdownlint --all-files --show-diff-on-failure

...
clients/python-client/README.md:106 MD001/heading-increment Heading levels should only increment by one level at a time [Expected: h3; Actual: h4]
CHANGELOG.md:420:1 MD030/list-marker-space Spaces after list markers [Expected: 1; Actual: 2]
```

Signed-off-by: David Xia <david@davidxia.com>

* [chore][kubectl-plugin] use consistent capitalization (#2950)

in comments and console messages.

capitalize "Ray" when used as a proper noun. Similar to #2922.

Signed-off-by: David Xia <david@davidxia.com>

* [Fix][RayJob] Invalid quote for RayJob submitter (#2949)

Closes: #2943

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [release][7/N] Update RayService YAMLs (#2956)

* [CI] Enable testifylint len rule (#2945)

Signed-off-by: LeoLiao123 <leoyeepaa@gmail.com>

* [docs][kubectl-plugin] improve help messages (#2952)

Signed-off-by: David Xia <david@davidxia.com>

* [kubectl-plugin] Fix panic when GPU resource is not set (#2954)

Signed-off-by: win5923 <ken89@kimo.com>

* [release][8/N] Upgrade Stable Diffusion RayService to Ray 2.41 (#2960)

* [docs][kubectl-plugin] fix incorrect example commands (#2951)

Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [fix][kubectl-plugin] set worker group CPU limit (#2958)

when creating a new worker group with `kubectl ray create workergroup`.
Write a unit test.

I noticed we are setting resource limits equal to resource requests everywhere
else but in this command. I have a K8s [LimitRange] that prevented the creation
of these worker Pods that had CPU limit defaulting to less than their CPU
requests. Describing the RayCluster showed this warning event.

`Failed to create worker Pod hyperkube/, Pod
"dxia-test-other-group-worker-pm2sh" is invalid:
spec.containers[0].resources.requests: Invalid value: "2": must be less than or
equal to cpu limit of 250m`

Signed-off-by: David Xia <david@davidxia.com>

[LimitRange]: https://kubernetes.io/docs/concepts/policy/limit-range/

* [RayJob] Deflaky RayJob e2e tests (#2963)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [RayService] Deflaky RayService envtest (#2962)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [release][9/N] Update text summarizer RayService to Ray 2.41 (#2961)

* [RayService] adapter vllm 0.6.1.post2 (#2823)

* adapter vllm 0.6.1.post2

* fix var define

* Unify the cpu Settings in serve.py in service.yaml, all set to 1

* Maintain the configuration with vllm0.5x

* [Release] Upgrade ray-job.batch-inference.yaml image to 2.41 (#2971)

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [chore][docs] enable Markdownlint rule MD010 (#2975)

* [CI] Change Pre-commit-shellcheck-to-shellcheck-py (#2974)

Signed-off-by: owenowenisme <mses010108@gmail.com>

* [release] Update Yunikorn YAML file to Ray 2.41 (#2969)

Signed-off-by: Cheng-Yeh Chung <kenchung285@gmail.com>

* [release] Update YuniKorn YAML files to Ray 2.41 (#2976)

Signed-off-by: win5923 <ken89@kimo.com>

* [chore][docs] enable Markdownlint rule MD004 (#2973)

[Unordered list style][1]

[1]: https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md004---unordered-list-style

* [Test] Use GcsFaultToleranceOptions in test and backward compatibility (#2972)

* Update samples to use Ray 2.41.0 images (#2964)

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

* Update TPU Ray CR manifests to use Ray 2.41.0 (#2965)

* [Refactor] Use constants for image tag, image repo, and versions in golang to avoid hard-coded strings (#2978)

---------

Signed-off-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
Signed-off-by: David Xia <david@davidxia.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: 400Ping <43886578+400Ping@users.noreply.github.com>
Signed-off-by: LeoLiao123 <leoyeepaa@gmail.com>
Signed-off-by: win5923 <ken89@kimo.com>
Signed-off-by: owenowenisme <mses010108@gmail.com>
Signed-off-by: Cheng-Yeh Chung <kenchung285@gmail.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Co-authored-by: David Xia <david@davidxia.com>
Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Co-authored-by: David Xia <dxia@spotify.com>
Co-authored-by: Mykhailo Bobrovskyi <mikhail.bobrovsky@gmail.com>
Co-authored-by: Ping <43886578+400Ping@users.noreply.github.com>
Co-authored-by: Owen Lin <106612301+owenowenisme@users.noreply.github.com>
Co-authored-by: Leo Liao <93932709+LeoLiao123@users.noreply.github.com>
Co-authored-by: Blocka <ken89@kimo.com>
Co-authored-by: zrant <37032227+pxp531@users.noreply.github.com>
Co-authored-by: kenchung285 <kenchung285@gmail.com>
Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com>
Co-authored-by: ryanaoleary <113500783+ryanaoleary@users.noreply.github.com>
Ygnas pushed a commit to Ygnas/kuberay that referenced this pull request Mar 20, 2025
* [RayService] More envtests that follow the most common scenario in the RayService code path (ray-project#2880)

Signed-off-by: Rueian <rueiancsie@gmail.com>

* [RayService] Remove outdated env tests (ray-project#2886)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [RayService] Refactor envtests (ray-project#2888)

* [docs][ray-operator] fix typo in Golang version (ray-project#2893)

Project uses Golang 1.22 not 1.20.

Signed-off-by: David Xia <david@davidxia.com>

* [Fix][kubectl-plugin] Fix no context nil error SIGSEGV in tests (ray-project#2892)

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [CI] Enable testifylint rule (ray-project#2896)

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [CI] Enable `testifylint` `error-nil` rule (ray-project#2907)

* [kubectl-plugin][feat] support specifying number of head GPUs (ray-project#2895)

when creating a RayCluster with `kubectl ray create cluster NAME --head-gpu N`.
Similar to the `--worker-gpu` switch.

Signed-off-by: David Xia <david@davidxia.com>

* Use webhook.CustomValidator instead of deprecated webhook.Validator. (ray-project#2803)

* Use webhook.CustomValidator instead of deprecated webhook.Validator.

* Move RayClusterWebhook to pkg/webhook/v1.

* [kubectl-plugin] update context error messages (ray-project#2891)

to tell user they can use `--context` to set the K8s context.

Add tests

Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [Fix][kubectl-plugin] make tests use a temporary kube config (ray-project#2894)

file that's not at the default path. Right now tests fail if there's a K8s
current context set with a command like `kubectl config use-context
my-context`.

This change allows tests to pass regardless of current context.

Signed-off-by: David Xia <david@davidxia.com>

* [CI] Enable testifylint formatter rule (ray-project#2915)

Signed-off-by: 400Ping <43886578+400Ping@users.noreply.github.com>

* [CI] Enable `testifylint` `empty` rule (ray-project#2908)

* [CI] Enable testifylint bool-compare rule (ray-project#2911)

Signed-off-by: 400Ping <43886578+400Ping@users.noreply.github.com>

* [CI] Auto download golang tools in pre-commit (ray-project#2917)

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [RayService] a safeguard for preventing overriding the pending cluster during a upgrade (ray-project#2887)

Signed-off-by: Rueian <rueiancsie@gmail.com>

* [RayService] Refactor unit tests for ShouldPrepareNewCluster (ray-project#2928)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [chore][kubectl-plugin] use consistent capitalization (ray-project#2922)

in comments and console messages.

* capitalize "Ray" when used as a proper noun
* capitalize Ray K8s CRDs like "RayJob" and "RayCluster"
* capitalize acronyms like "YAML"
* fix some minor typos

No functional changes.

Signed-off-by: David Xia <david@davidxia.com>

* [CI] Enable `testifylint` `require-error` rule (ray-project#2909)

* [kubectl-plugin] support general `kubectl` switches like `--context` (ray-project#2883)

* [CI] Fix lint error (require-error) (ray-project#2931)

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [CI] Enable `testifylint` `float-compare` rule (ray-project#2910)

* [docs] move pre-commit instructions to main dev docs (ray-project#2921)

* [CI] Enable `testifylint` `expected-actual` rule (ray-project#2914)

* [CI] Generate CRD json schema separately in pre-commit (ray-project#2930)

* [release][1/N] Update YAMLs from Ray 2.9 to Ray 2.41 (ray-project#2934)

* Delete `[raycluster|rayjob|rayservice]_types_test.go` unnecessary tests (ray-project#2935)

* [release][2/N] Update RayCluster Helm chart from Ray 2.9 to Ray 2.41 (ray-project#2936)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [release][3/N] Update RayService e2e tests YAML files from Ray 2.9 to Ray 2.41 (ray-project#2937)

* [release][4/N] Update Ray images / versions in kubectl plugin (ray-project#2938)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [release][5/N] Update some RayJob YAMLs from Ray 2.9 to Ray 2.41 (ray-project#2941)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [release][6/N] Remove unnecessary YAMLs (ray-project#2946)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [docs][kubectl-plugin] add dev docs (ray-project#2912)

Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [CI] Add shellcheck and fix error of it (ray-project#2933)

* [chore][kubectl-plugin] use better test assertions (ray-project#2955)

Fix two places.

* use `assert.EqualError()` instead of `assert.Error()` to check the error
  string is what we expect
* use testify assertion instead of string checking for better error output

Signed-off-by: David Xia <david@davidxia.com>

* [chore] add Markdown linting pre-commit hook (ray-project#2953)

Ignore most rules we violate for now.
Fix these two violations.

```
pre-commit run markdownlint --all-files --show-diff-on-failure

...
clients/python-client/README.md:106 MD001/heading-increment Heading levels should only increment by one level at a time [Expected: h3; Actual: h4]
CHANGELOG.md:420:1 MD030/list-marker-space Spaces after list markers [Expected: 1; Actual: 2]
```

Signed-off-by: David Xia <david@davidxia.com>

* [chore][kubectl-plugin] use consistent capitalization (ray-project#2950)

in comments and console messages.

capitalize "Ray" when used as a proper noun. Similar to ray-project#2922.

Signed-off-by: David Xia <david@davidxia.com>

* [Fix][RayJob] Invalid quote for RayJob submitter (ray-project#2949)

Closes: ray-project#2943

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [release][7/N] Update RayService YAMLs (ray-project#2956)

* [CI] Enable testifylint len rule (ray-project#2945)

Signed-off-by: LeoLiao123 <leoyeepaa@gmail.com>

* [docs][kubectl-plugin] improve help messages (ray-project#2952)

Signed-off-by: David Xia <david@davidxia.com>

* [kubectl-plugin] Fix panic when GPU resource is not set (ray-project#2954)

Signed-off-by: win5923 <ken89@kimo.com>

* [release][8/N] Upgrade Stable Diffusion RayService to Ray 2.41 (ray-project#2960)

* [docs][kubectl-plugin] fix incorrect example commands (ray-project#2951)

Signed-off-by: David Xia <david@davidxia.com>
Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [fix][kubectl-plugin] set worker group CPU limit (ray-project#2958)

when creating a new worker group with `kubectl ray create workergroup`.
Write a unit test.

I noticed we are setting resource limits equal to resource requests everywhere
else but in this command. I have a K8s [LimitRange] that prevented the creation
of these worker Pods that had CPU limit defaulting to less than their CPU
requests. Describing the RayCluster showed this warning event.

`Failed to create worker Pod hyperkube/, Pod
"dxia-test-other-group-worker-pm2sh" is invalid:
spec.containers[0].resources.requests: Invalid value: "2": must be less than or
equal to cpu limit of 250m`

Signed-off-by: David Xia <david@davidxia.com>

[LimitRange]: https://kubernetes.io/docs/concepts/policy/limit-range/

* [RayJob] Deflaky RayJob e2e tests (ray-project#2963)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [RayService] Deflaky RayService envtest (ray-project#2962)

Signed-off-by: kaihsun <kaihsun@anyscale.com>

* [release][9/N] Update text summarizer RayService to Ray 2.41 (ray-project#2961)

* [RayService] adapter vllm 0.6.1.post2 (ray-project#2823)

* adapter vllm 0.6.1.post2

* fix var define

* Unify the cpu Settings in serve.py in service.yaml, all set to 1

* Maintain the configuration with vllm0.5x

* [Release] Upgrade ray-job.batch-inference.yaml image to 2.41 (ray-project#2971)

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>

* [chore][docs] enable Markdownlint rule MD010 (ray-project#2975)

* [CI] Change Pre-commit-shellcheck-to-shellcheck-py (ray-project#2974)

Signed-off-by: owenowenisme <mses010108@gmail.com>

* [release] Update Yunikorn YAML file to Ray 2.41 (ray-project#2969)

Signed-off-by: Cheng-Yeh Chung <kenchung285@gmail.com>

* [release] Update YuniKorn YAML files to Ray 2.41 (ray-project#2976)

Signed-off-by: win5923 <ken89@kimo.com>

* [chore][docs] enable Markdownlint rule MD004 (ray-project#2973)

[Unordered list style][1]

[1]: https://github.com/DavidAnson/markdownlint/blob/main/doc/Rules.md#md004---unordered-list-style

* [Test] Use GcsFaultToleranceOptions in test and backward compatibility (ray-project#2972)

* Update samples to use Ray 2.41.0 images (ray-project#2964)

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

* Update TPU Ray CR manifests to use Ray 2.41.0 (ray-project#2965)

* [Refactor] Use constants for image tag, image repo, and versions in golang to avoid hard-coded strings (ray-project#2978)

---------

Signed-off-by: Rueian <rueiancsie@gmail.com>
Signed-off-by: kaihsun <kaihsun@anyscale.com>
Signed-off-by: David Xia <david@davidxia.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: 400Ping <43886578+400Ping@users.noreply.github.com>
Signed-off-by: LeoLiao123 <leoyeepaa@gmail.com>
Signed-off-by: win5923 <ken89@kimo.com>
Signed-off-by: owenowenisme <mses010108@gmail.com>
Signed-off-by: Cheng-Yeh Chung <kenchung285@gmail.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Co-authored-by: Rueian <rueiancsie@gmail.com>
Co-authored-by: Kai-Hsun Chen <kaihsun@anyscale.com>
Co-authored-by: David Xia <david@davidxia.com>
Co-authored-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Co-authored-by: David Xia <dxia@spotify.com>
Co-authored-by: Mykhailo Bobrovskyi <mikhail.bobrovsky@gmail.com>
Co-authored-by: Ping <43886578+400Ping@users.noreply.github.com>
Co-authored-by: Owen Lin <106612301+owenowenisme@users.noreply.github.com>
Co-authored-by: Leo Liao <93932709+LeoLiao123@users.noreply.github.com>
Co-authored-by: Blocka <ken89@kimo.com>
Co-authored-by: zrant <37032227+pxp531@users.noreply.github.com>
Co-authored-by: kenchung285 <kenchung285@gmail.com>
Co-authored-by: fscnick <6858627+fscnick@users.noreply.github.com>
Co-authored-by: ryanaoleary <113500783+ryanaoleary@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[release] Update Volcano YAML files to Ray 2.41
2 participants