Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
211 changes: 101 additions & 110 deletions apiserver/Autoscaling.md
Original file line number Diff line number Diff line change
@@ -1,122 +1,99 @@
# Creating Autoscaling clusters using API server
# Creating Autoscaling clusters using APIServer

One of the fundamental features of Ray is autoscaling. This [document] describes how to set up
autoscaling using Ray operator. Here we will describe how to set it up using API server.
One of Ray's key features is autoscaling. This [document] explains how to set up autoscaling
with the Ray operator. Here, we demonstrate how to configure it using the APIServer and
run an example.

## Deploy KubeRay operator and API server
## Setup

Refer to [readme](README.md) for setting up KubRay operator and API server.
Refer to the [README](README.md) for setting up the KubeRay operator and APIServer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two sections in "Installation". Please explicitly ask users to follow "Install with Helm".

"Start a local apiserver" should only be used for development.

```shell
make operator-image cluster load-operator-image deploy-operator
## Example

This example walks through how to trigger scale-up and scale-down for RayCluster.

Before proceeding with the example, remove any running RayClusters to ensure a successful
execution of the steps below.

```sh
kubectl delete raycluster --all
```

Alternatively, you could build and deploy the Operator and API server from local repo for
development purpose.
> [!IMPORTANT]
> All the following guidance requires you to switch your working directory to the KubeRay `apiserver`

### Install ConfigMap

```shell
make operator-image cluster load-operator-image deploy-operator docker-image load-image deploy
Install this [ConfigMap], which contains the code for our example. Simply download
the file and run:

```sh
kubectl apply -f test/cluster/cluster/detachedactor.yaml
```

Additionally install this [ConfigMap] containing code that we will use for testing.
Check if the ConfigMap is successfully created. You should see `ray-example` in the list:

```sh
kubectl get configmaps
# NAME DATA AGE
# ray-example 2 8s
```

## Deploy Ray cluster
### Deploy RayCluster

Once they are set up, you first need to create a Ray cluster using the following commands:
Before running the example, deploy a RayCluster with the following command:

```shell
```sh
# Create compute template
curl -X POST 'localhost:31888/apis/v1/namespaces/default/compute_templates' \
--header 'Content-Type: application/json' \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you assuming users will install the KubeRay API server outside the Kubernetes cluster? The documentation should instruct users to install the KubeRay API server inside the cluster.

Then, you can ask users to either set up port forwarding or create a curl pod to submit requests.

--data '{
"name": "default-template",
"namespace": "default",
"cpu": 2,
"memory": 4
}'
--data @docs/api-example/compute_template.json

# Create RayCluster
curl -X POST 'localhost:31888/apis/v1/namespaces/default/clusters' \
--header 'Content-Type: application/json' \
--data '{
"name": "test-cluster",
"namespace": "default",
"user": "boris",
"clusterSpec": {
"enableInTreeAutoscaling": true,
"autoscalerOptions": {
"upscalingMode": "Default",
"idleTimeoutSeconds": 30,
"cpu": "500m",
"memory": "512Mi"
},
"headGroupSpec": {
"computeTemplate": "default-template",
"image": "rayproject/ray:2.9.0-py310",
"serviceType": "NodePort",
"rayStartParams": {
"dashboard-host": "0.0.0.0",
"metrics-export-port": "8080",
"num-cpus": "0"
},
"volumes": [
{
"name": "code-sample",
"mountPath": "/home/ray/samples",
"volumeType": "CONFIGMAP",
"source": "ray-example",
"items": {
"detached_actor.py": "detached_actor.py",
"terminate_detached_actor.py": "terminate_detached_actor.py"
}
}
]
},
"workerGroupSpec": [
{
"groupName": "small-wg",
"computeTemplate": "default-template",
"image": "rayproject/ray:2.9.0-py310",
"replicas": 0,
"minReplicas": 0,
"maxReplicas": 5,
"rayStartParams": {
"node-ip-address": "$MY_POD_IP"
},
"volumes": [
{
"name": "code-sample",
"mountPath": "/home/ray/samples",
"volumeType": "CONFIGMAP",
"source": "ray-example",
"items": {
"detached_actor.py": "detached_actor.py",
"terminate_detached_actor.py": "terminate_detached_actor.py"
}
}
]
}
]
}
}'
--data @docs/api-example/autoscaling_clusters.json
```

## Validate that Ray cluster is deployed correctly
This command performs two main operations:

Run:
1. Creates a compute template `default-template` that specifies resources to use during
scale-up (2 CPUs and 4 GiB memory).

```shell
kubectl get pods
```
2. Deploys a RayCluster (test-cluster) with:
- A head pod that manages the cluster
- A worker group configured to scale between 0 and 5 replicas

The worker group uses the following autoscalerOptions to control scaling behavior:

- **`upscalingMode: "Default"`**: Default scaling behavior. Ray will scale up only as
needed.
- **`idleTimeoutSeconds: 30`**: If a worker pod remains idle (i.e., not running any tasks)
for 30 seconds, it will be automatically removed.
- **`cpu: "500m"`, `memory: "512Mi"`**: Defines the **minimum resource unit** Ray uses to
assess scaling needs. If no worker pod has at least this much free capacity, Ray will
trigger a scale-up and launch a new worker pod.

You should get something like this:
> **Note:** These values **do not determine the actual size** of the worker pod. The
> pod size comes from the `computeTemplate` (in this case, 2 CPUs and 4 GiB memory).

```shell
test-cluster-head-pr25j 2/2 Running 0 2m49s
### Validate that RayCluster is deployed correctly

Run the following command to get a list of pods running. You should see something like below:

```sh
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# kuberay-operator-545586d46c-f9grr 1/1 Running 0 49m
# test-cluster-head 2/2 Running 0 3m1s
```

Note that only head pod is running and it has 2 containers
Note that there is no worker for `test-cluster` as we set its initial replicas to 0. You
will only see head pod with 2 containers for `test-cluster`.

## Trigger RayCluster scale-up
### Trigger RayCluster scale-up

Create a detached actor:
Create a detached actor to trigger scale-up with the following command:

```sh
curl -X POST 'localhost:31888/apis/v1/namespaces/default/jobs' \
Expand All @@ -132,24 +109,26 @@ curl -X POST 'localhost:31888/apis/v1/namespaces/default/jobs' \
}'
```

Because we have specified `num_cpu: 0` for head node, this will cause creation of a worker node. Run:
The `detached_actor.py` file is defined in the [ConfigMap] we installed earlier and
mounted to the head node, which requires `num_cpus=1`. Recall that initially there is no
worker pod exists, RayCluster needs to scale up a worker for running this actor.

```shell
kubectl get pods
```
Check if a worker is created. You should see a worker `test-cluster-small-wg-worker` spin
up.

You should get something like this:
```sh
kubectl get pods

```shell
test-cluster-head-pr25j 2/2 Running 0 15m
test-cluster-worker-small-wg-qrjfm 1/1 Running 0 2m48s
# NAME READY STATUS RESTARTS AGE
# create-actor-tsvfc 0/1 Completed 0 99s
# kuberay-operator-545586d46c-f9grr 1/1 Running 0 55m
# test-cluster-head 2/2 Running 0 9m37s
# test-cluster-small-wg-worker-j54xf 1/1 Running 0 88s
```

You can see that a worker node have been created.
### Trigger RayCluster scale-down

## Trigger RayCluster scale-down

Run:
Run the following command to delete the actor we created earlier:

```sh
curl -X POST 'localhost:31888/apis/v1/namespaces/default/jobs' \
Expand All @@ -165,16 +144,28 @@ curl -X POST 'localhost:31888/apis/v1/namespaces/default/jobs' \
}'
```

A worker Pod will be deleted after `idleTimeoutSeconds` (default 60s, we specified 30) seconds. Run:
Once the actor is deleted, the worker is no longer needed. The worker pod will be deleted
after `idleTimeoutSeconds` (default 60; we specified 30) seconds.

List all pods to verify that the worker pod is deleted:

```shell
```sh
kubectl get pods

# NAME READY STATUS RESTARTS AGE
# create-actor-tsvfc 0/1 Completed 0 6m37s
# delete-actor-89z8c 0/1 Completed 0 83s
# kuberay-operator-545586d46c-f9grr 1/1 Running 0 60m
# test-cluster-head 2/2 Running 0 14m

```

And you should see only head node (worker node is deleted)
### Clean up

```shell
test-cluster-head-pr25j 2/2 Running 0 27m
```sh
make clean-cluster
# Remove apiserver from helm
helm uninstall kuberay-apiserver
```

[document]: https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/configuring-autoscaling.html
Expand Down
Loading
Loading