-
Notifications
You must be signed in to change notification settings - Fork 602
Added Ray-Serve Config For LLMs #3517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config looks good to me. (tho I haven't run the config myself)
Should I also add config for autoscaling? |
Chatted with @Blaze-DSP offline |
What is the plan? @kevin85421 |
Add a doc in Ray repo and make this example simpler (e.g. remove LoRA). |
I have updated the ray serve llm config and added doc for it in the ray repo. PR For Doc.: ray-serve llm doc |
Going to give this a shot on my setup in the next week-ish. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked great on my setup. Thanks for the PR @Blaze-DSP!
@eicherseiji could you push to this branch directly to fix CI issues so that I can merge this PR? Thanks! |
|
Signed-off-by: DPatel_7 <dpatel@gocommotion.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: DPatel_7 <dpatel@gocommotion.com> Co-authored-by: Seiji Eicher <seiji@anyscale.com>
limits: | ||
cpu: 32 | ||
memory: 32Gi | ||
nvidia.com/gpu: "4" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it will depend on the GPU type, but does Qwen/Qwen2.5-7B-Instruct
really need 4 GPUs? What GPUs did you test with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I noticed that tensor parallelism is not set, so it must only be using 1 GPU, I suggest updating this example to only request 1 GPU for each worker
Co-authored-by: DPatel_7 <dpatel@gocommotion.com> Co-authored-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: DPatel_7 <dpatel@gocommotion.com> Co-authored-by: Seiji Eicher <seiji@anyscale.com>
Added Example Config For Ray-Serve LLM