[release] Update Ray image to 2.34.0 #2303

kevin85421 · 2024-08-12T19:00:41Z

Why are these changes needed?

Remove the compatibility tests for Ray 2.7.0 from test-job.yaml, and add the test for Ray 2.34.0.
Change almost all images from Ray 2.9.0 to Ray 2.34.0. However, I didn't change ray-ml:2.9.0 to ray-ml:2.34.0. We will revisit it later. The progress is tracked by Replace references of rayproject/ray-ml image #2292.

Related issue number

Checks

I've made sure the tests are passing.
Testing Strategy
- Unit tests
- Manual tests
- This PR is not tested :(

Signed-off-by: kaihsun <kaihsun@anyscale.com>

andrewsykim · 2024-08-12T19:09:27Z

ray-operator/config/security/ray-cluster.pod-security.yaml

@@ -85,7 +85,7 @@ spec:
      spec:
        containers:
        - name: ray-worker
-          image: rayproject/ray-ml:2.4.0
+          image: rayproject/ray-ml:2.9.0


should we update this to use 2.34.0.fc8721 or 2.34.0.deprecated?

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 · 2024-08-13T07:27:23Z

.github/workflows/test-job.yaml

@@ -312,13 +312,13 @@ jobs:
        run: go test ./pkg/... -race -parallel 4
        working-directory: ${{env.working-directory}}

-  test-compatibility-2_7_0:


Remove 2.7.0: https://docs.ray.io/en/releases-2.10.0/cluster/kubernetes/user-guides/upgrade-guide.html#kuberay-upgrade-guide

kevin85421 · 2024-08-13T07:29:48Z

ray-operator/test/e2e/rayservice_ha_test.go

@@ -38,7 +38,7 @@ applications:
  args:
    num_forwards: 0
  runtime_env:
-    working_dir: https://github.com/ray-project/serve_workloads/archive/a2e2405f3117f1b4134b6924b5f44c4ff0710c00.zip
+    working_dir: https://github.com/ray-project/serve_workloads/archive/a9f184f4d9ddb7f9a578502ae106470f87a702ef.zip


Ray Serve introduced some breaking changes after Ray 2.9.0, and I opened a PR to fix the issue ray-project/serve_workloads#4.

kevin85421 · 2024-08-13T07:30:59Z

ray-operator/config/samples/ray-service.autoscaler.yaml

            autoscaling_config:
              metrics_interval_s: 0.2
              min_replicas: 1
              max_replicas: 14
              look_back_period_s: 2
              downscale_delay_s: 5
              upscale_delay_s: 2
-              target_num_ongoing_requests_per_replica: 1
+              target_ongoing_requests: 1


https://docs.ray.io/en/releases-2.10.0/serve/advanced-guides/advanced-autoscaling.html#target-num-ongoing-requests-per-replica-default-1

kevin85421 · 2024-08-13T07:31:38Z

ray-operator/config/samples/ray-service.autoscaler.yaml

            graceful_shutdown_timeout_s: 5
-            max_concurrent_queries: 1000


https://docs.ray.io/en/releases-2.10.0/serve/advanced-guides/advanced-autoscaling.html#max-concurrent-queries-default-100-deprecated

kevin85421 · 2024-08-13T07:41:35Z

tests/test_sample_rayservice_yamls.py

@@ -285,6 +285,10 @@ def test_zero_downtime_rollout(self, set_up_cluster):
            for cr_event in cr_events:
                cr_event.trigger()

+
+@pytest.mark.skip(


Ray Serve introduces some breaking changes (see ray-service.autoscaler.yaml and rayservice_ha_test.go). The autoscaling test logic no longer works.

(I don't read the code. This is just my observation and guess, and the very limited information that I can get from proxy actor's logs)
Based on my observation and guess, Ray Serve will queue pending requests instead of failing them. The pending requests in the queue will affect the request for "signal" to be scheduled. When I set max_ongoing_requests to a small number (e.g. 1), the request for "signal" can be scheduled correctly and unlock some "block" requests. However, some pending requests will be scheduled to the replicas immediately, so the replicas can't be scaled down.

https://sourcegraph.com/github.com/ray-project/ray/-/blob/python/ray/serve/_private/replica_scheduler/pow_2_scheduler.py?L701

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 · 2024-08-14T01:07:39Z

ray-operator/config/samples/ray-service.autoscaler.yaml

            ray_actor_options:
              num_cpus: 0.5
      - name: signal
        import_path: autoscaling.signaling:app
        route_prefix: /signal
+        deployments:
+          - name: SignalDeployment
+            max_ongoing_requests: 1000


Each block request will also send a request to signal replica, so max_ongoing_requests should more than 20 (number of block requests). The default value changes from 100 to 5 in Ray 2.32.0.

andrewsykim · 2024-08-15T01:30:23Z

...perator/config/samples/pytorch-resnet-image-classifier/ray-job.pytorch-image-classifier.yaml

@@ -31,7 +31,7 @@ spec:
          serviceAccountName: pytorch-distributed-training
          containers:
            - name: ray-head
-              image: rayproject/ray:2.9.0
+              image: rayproject/ray:2.34.0


This example no longer works because head node is on 2.34 and worker node is on 2.9 using ray-ml image. Should we revert this back to 2.9 or update the worker image to use the deprecated ray-ml images for now?

This reverts commit 678ec25.

* odh/dev: CARRY: Update upstream component_metadata location CARRY: Add upstream metadata to Kuberay manifests PATCH: CVE fix - Upgrade golang.org/x/net from 0.26.0 to 0.33.0 CARRY: Updated GO version to 1.22 in odh release workflow CARRY: Updated KubeRay image to v1.2.2 CARRY: Set FS group to MustRunAs for Ray SCC PATCH: Raise head pod memory limit to avoid test instability PATCH: Add SecurityContext to ray pods to function with restricted pod-security CARRY: Add delete patch to remove default namespace (ray-project#16) CARRY: Add workflow to release ODH/Kuberay with compiled test binaries PATCH: add aggregator role for admin and editor PATCH: CVE fix - Replace go-sqlite3 version to upgraded version PATCH: openshift kustomize overlay for odh operator [Telemetry][v1.2.2] Update KUBERAY_VERSION (ray-project#2417) [release v1.2.2] Update tags and versions (ray-project#2416) Revert "[release] Update Ray image to 2.34.0 (ray-project#2303)" (ray-project#2413) (ray-project#2415)

update to 2.34.0

fd294c0

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 changed the title ~~Update Ray image to 2.34.0~~ [release] Update Ray image to 2.34.0 Aug 12, 2024

kevin85421 mentioned this pull request Aug 12, 2024

Replace references of rayproject/ray-ml image #2292

Open

2 tasks

andrewsykim reviewed Aug 12, 2024

View reviewed changes

kevin85421 added 3 commits August 12, 2024 21:14

use DeploymentHandle instead

f686fcf

change to new Serve API

b36ac0c

Signed-off-by: kaihsun <kaihsun@anyscale.com>

skip ray serve autoscaling

7be9569

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 commented Aug 13, 2024

View reviewed changes

fix autoscaling issue

a81b8a1

Signed-off-by: kaihsun <kaihsun@anyscale.com>

kevin85421 added the 1.2.0 label Aug 14, 2024

kevin85421 commented Aug 14, 2024

View reviewed changes

kevin85421 marked this pull request as ready for review August 14, 2024 01:15

zcin approved these changes Aug 14, 2024

View reviewed changes

kevin85421 merged commit 678ec25 into ray-project:master Aug 14, 2024
27 checks passed

kevin85421 mentioned this pull request Aug 14, 2024

Make sure RayService example still work #2307

Closed

2 tasks

andrewsykim reviewed Aug 15, 2024

View reviewed changes

kevin85421 added a commit to kevin85421/kuberay that referenced this pull request Sep 29, 2024

Revert "[release] Update Ray image to 2.34.0 (ray-project#2303)"

60e6bce

This reverts commit 678ec25.

kevin85421 added a commit that referenced this pull request Sep 29, 2024

Revert "[release] Update Ray image to 2.34.0 (#2303)" (#2413)

d8ffec4

This reverts commit 678ec25.

kevin85421 added a commit that referenced this pull request Sep 29, 2024

Revert "[release] Update Ray image to 2.34.0 (#2303)" (#2413) (#2415)

cb6418f

This reverts commit 678ec25.

kevin85421 mentioned this pull request Dec 17, 2024

[CI] Deflaky TestRayServiceGCSFaultTolerance #2660

Merged

4 tasks

kevin85421 mentioned this pull request Feb 6, 2025

[release][7/N] Update RayService YAMLs #2956

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release] Update Ray image to 2.34.0 #2303

[release] Update Ray image to 2.34.0 #2303

Uh oh!

kevin85421 commented Aug 12, 2024 •

edited

Loading

Uh oh!

andrewsykim Aug 12, 2024

Uh oh!

kevin85421 Aug 13, 2024

Uh oh!

kevin85421 Aug 13, 2024

Uh oh!

kevin85421 Aug 13, 2024

Uh oh!

kevin85421 Aug 13, 2024

Uh oh!

kevin85421 Aug 13, 2024

Uh oh!

kevin85421 Aug 14, 2024

Uh oh!

Uh oh!

andrewsykim Aug 15, 2024

Uh oh!

Uh oh!

[release] Update Ray image to 2.34.0 #2303

[release] Update Ray image to 2.34.0 #2303

Uh oh!

Conversation

kevin85421 commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kevin85421 commented Aug 12, 2024 •

edited

Loading