-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
Change default model to Qwen3-0.6B #20335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @tlrmchlsmth, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request updates the default large language model configured for the vllm serve
command. The change aims to provide a more modern and efficient default model for users, improving the out-of-the-box experience by switching from an older, smaller model to a more capable yet still compact alternative.
Highlights
- Default Model Configuration: The default model used by
vllm serve
has been updated fromfacebook/opt-125m
toQwen/Qwen3-0.6B
within theModelConfig
class. This change aims to provide a more modern and efficient default model for users.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request changes the default model for vllm serve
from facebook/opt-125m
to Qwen/Qwen3-0.6B
. While using a more modern small model is a good goal, the chosen model introduces a couple of issues for a default setting. Most critically, it appears to require trust_remote_code=True
, which would break the default vllm serve
command and require users to enable a security-sensitive flag. Additionally, the new model is significantly larger, which could negatively impact the first-time user experience. I've provided detailed feedback on these points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, better to have a useable model by default
TBH I think there should be no model at all, but we've already signed up for this behavior
+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this model could be better
I agree no model at all would be better. I left this alone during the great config refactor because I was already making lots of changes and didn't want too many design decisions that would slow down the maintainability focused refactor. |
Actually, I think we can't do that yet because our transformers version requirement is still at 4.51.1 |
The transformers version has now been updated on main so we should be able to continue with this PR |
Need to merge from main |
+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would concern more on the CI, whether this change will cause more OOMs in CI. Maybe I concern too much.
This is a good point -- perhaps we should consider deprecating this feature or at least adding a |
I think it makes more sense to deprecate the feature because ideally I would not expect any users to be relying on this behavior to run their systems. And even if they did, this PR that got merged in, is likely going to break what they were doing. Just to be pragmatic, I don't see any reason why we need to introduce a new flag for a feature that should not be relied on in the first place. |
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Honestly I did not know that you could just run
vllm serve
and it would default to serving opt-125m before 15 minutes ago.This changes the default to Qwen3-06.B since it's good and small.