-
Notifications
You must be signed in to change notification settings - Fork 615
Open
6 / 76 of 7 issues completedLabels
Description
Search before asking
- I had searched in the issues and found no similar feature requirement.
Description
Currently, there are some autoscaling end-to-end tests for KubeRay in the Ray repository. However, there are some issues with the tests:
- It uses Python.
- To run the tests, you need to build the Ray source code for the test utilities.
- There are some dependencies between the Ray version of the Pods in the K8s cluster and the Ray version in your local environment because it uses
kubectl port-forward
and interacts with the Ray head via Ray client andray job submit
. - It only tests the Ray nightly and KubeRay latest stable release. It doesn't test the compatibility between the Ray nightly and KubeRay nightly.
Here, we plan to build new autoscaling end-to-end tests in the KubeRay repository.
- Use Golang instead of Python so that we can leverage the K8s ecosystem in Golang to add new tests easily.
- Test KubeRay nightly with Ray nightly.
- It doesn't require to build Ray.
- It shouldn't have any dependencies between Ray Pods and local Ray version.
- Then, we will replace the existing Ray autoscaling e2e tests with this one.
Progress
- [core][autoscaler] Remove local dependencies on Ray for KubeRay autoscaling e2e tests ray#48566
- [Test][Autoscaler][1/n] Add Ray Autoscaler e2e tests #2168
- Fake GPU tests (@rueian)
- We don't set GPUs in the Pod's resource request/limit, but we set
--num-gpus
in rayStartParams to simulate the autoscaling behavior of GPUs. - [Test][Autoscaler][2/n] Add Ray Autoscaler e2e tests for GPU workers #2181
- We don't set GPUs in the Pod's resource request/limit, but we set
- Make the Ray image configurable.
- Fake single-host TPU tests
- Fake multi-host TPU tests
- Self-defined resource tests (example) (@MortalHappiness): [Test][Autoscaling] Add custom resource test #2193
- [Feature] Add e2e Ray v2 Autoscaler Tests with KubeRay idleTimeoutSeconds per worker group #2561
- [Feature] Add E2E Test for Autoscaler Nested Remote Functions #2568
- [Feature] Add e2e tests for Autoscaler V2 #2574
- [Feature] Add e2e tests for inconsistency between worker group's
replicas
and the number of Pods #2575 - [Feature] Add an e2e test for Autoscaler to scale up by manually updating
minReplicas
#2576 - Replace the old Ray autoscaling e2e tests in the Ray repository.
- [Umbrella] Add Autoscaler e2e tests for partial placement groups #3227
- [Umbrella] Autoscaler IPPR E2E test #4028
Use case
No response
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!