[Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes

### 🚀 The feature, motivation and pitch

Currently, distributed inference (TP) in vLLM relies on ray to orchestrate the gpu workers. I briefly check the code and seems the core distributed communication is provided by `torch.distributed` with nccl backend, actor's communication is not done in Ray's own protocol. In this case, Ray just plays the role of orchestration and resource reservation (placement group).  Please correct me if I am wrong.

We do use Ray and KubeRay on Kubernetes and I've successfully tested vLLM distributed inference on this setup, confirming its functional operation. However, we have many users/platforms, we do not want to lock on Ray since some teams may not have enough Ray knowledge to cover the operation. My proposal is to provide a simple orchestration on top of `GPUExecutor` for those users who are familiar with cloud native techs. They would like to use Kubernetes's capability for orchestration (ray actors) and scheduling (placement group).  

Ideally, we would have both Ray and Kubernetes as orchestrators for vLLMs, providing our platform users with alternative options for their needs.

Please help check whether this proposal makes sense. I can contribute to this feature.



### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes #3902

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Support Ray-free multi-node distributed inference on resource managers like Kubernetes #3902

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions