Skip to content

Added Ray-Serve Config For LLMs #3517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 9, 2025
Merged

Conversation

Blaze-DSP
Copy link
Contributor

Added Example Config For Ray-Serve LLM

Copy link

@kouroshHakha kouroshHakha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config looks good to me. (tho I haven't run the config myself)

@kouroshHakha kouroshHakha requested a review from kevin85421 May 1, 2025 05:39
@Blaze-DSP
Copy link
Contributor Author

Should I also add config for autoscaling?

@kevin85421
Copy link
Member

Chatted with @Blaze-DSP offline

@kouroshHakha
Copy link

kouroshHakha commented May 6, 2025

What is the plan? @kevin85421

@kevin85421
Copy link
Member

What is the plan? @kevin85421

Add a doc in Ray repo and make this example simpler (e.g. remove LoRA).

@Blaze-DSP
Copy link
Contributor Author

I have updated the ray serve llm config and added doc for it in the ray repo.

PR For Doc.: ray-serve llm doc

@eicherseiji
Copy link
Contributor

Going to give this a shot on my setup in the next week-ish.

@eicherseiji eicherseiji self-requested a review June 7, 2025 01:24
Copy link
Contributor

@eicherseiji eicherseiji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked great on my setup. Thanks for the PR @Blaze-DSP!

@kevin85421
Copy link
Member

@eicherseiji could you push to this branch directly to fix CI issues so that I can merge this PR? Thanks!

@eicherseiji
Copy link
Contributor

DPatel_7 and others added 7 commits June 8, 2025 08:44
Signed-off-by: DPatel_7 <dpatel@gocommotion.com>
Signed-off-by: DPatel_7 <dpatel@gocommotion.com>
Signed-off-by: DPatel_7 <dpatel@gocommotion.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@kevin85421 kevin85421 merged commit 280902f into ray-project:master Jun 9, 2025
25 checks passed
MortalHappiness pushed a commit to MortalHappiness/kuberay that referenced this pull request Jun 9, 2025
Co-authored-by: DPatel_7 <dpatel@gocommotion.com>
Co-authored-by: Seiji Eicher <seiji@anyscale.com>
limits:
cpu: 32
memory: 32Gi
nvidia.com/gpu: "4"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it will depend on the GPU type, but does Qwen/Qwen2.5-7B-Instruct really need 4 GPUs? What GPUs did you test with?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I noticed that tensor parallelism is not set, so it must only be using 1 GPU, I suggest updating this example to only request 1 GPU for each worker

pawelpaszki pushed a commit to opendatahub-io/kuberay that referenced this pull request Jun 10, 2025
Co-authored-by: DPatel_7 <dpatel@gocommotion.com>
Co-authored-by: Seiji Eicher <seiji@anyscale.com>
kevin85421 pushed a commit to kevin85421/kuberay that referenced this pull request Jun 12, 2025
Co-authored-by: DPatel_7 <dpatel@gocommotion.com>
Co-authored-by: Seiji Eicher <seiji@anyscale.com>
MortalHappiness pushed a commit that referenced this pull request Jun 13, 2025
Co-authored-by: Blaze-DSP <111348803+Blaze-DSP@users.noreply.github.com>
Co-authored-by: DPatel_7 <dpatel@gocommotion.com>
Co-authored-by: Seiji Eicher <seiji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants