-
-
Notifications
You must be signed in to change notification settings - Fork 10k
Gemma3n (Text-only) #20134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gemma3n (Text-only) #20134
Conversation
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @robertgshaw2-redhat, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces comprehensive support for the Gemma3n model, a new large language model. It involves integrating its distinct architectural components, including a novel sparse activation function and specialized layer structures, and registering it within the existing model framework.
Highlights
- New Model Integration: I've added the complete architecture for the Gemma3n model, enabling its use within the system. This includes its unique layer components and overall structure.
- Custom Activation Function: I've introduced
GeluAndMulSparse
, a specialized sparse GELU activation function that is specifically designed for and utilized by the Gemma3n model. - Advanced Layer Structures: I've implemented unique components integral to the Gemma3n architecture, such as
Gemma3nAltUp
for 'Alternating Updates' andGemma3nLaurelBlock
for 'Learned Augmented Residual Layers'. - Model Registry Update: I've updated the model registry to include the new
Gemma3nForConditionalGeneration
model, making it discoverable and loadable by the system.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The code changes introduce a new model, Gemma3n, to the vLLM framework. The changes include adding a new activation function, GeluAndMulSparse, and the model implementation itself. There are a few suggestions to improve the code, including correcting a docstring, using f-strings for better readability, and adding a tolerance value when comparing floating-point numbers.
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments so PTAL - we can also add the model to documentation but indicate that only text-input is supported for now.
It looks like models are public https://huggingface.co/collections/google/gemma-3n-685065323f5984ef315c93f4 Could we add it to the test registry? https://github.com/vllm-project/vllm/blob/04e1642e3251fc575d104c84782fafea348cfbaf/tests/models/registry.py |
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
I would like to get this in now so we can have it in the nightly. Ill add a test tomorrow |
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to get this in now so we can have it in the nightly. Ill add a test tomorrow
I think that's fine since this PR is a model specific PR and should be low risk (assuming the transformers
version bump doesn't break anything)
Basic models test is failing because there isn't a registered model. You can add one and put a skip arg in it https://buildkite.com/vllm/ci/builds/22772/steps/canvas?jid=0197aea1-2404-4b82-8219-729311822434#0197aea1-2404-4b82-8219-729311822434/214-1806 |
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.me>
requirements/common.txt
Outdated
@@ -7,7 +7,7 @@ requests >= 2.26.0 | |||
tqdm | |||
blake3 | |||
py-cpuinfo | |||
transformers >= 4.51.1 | |||
transformers >= 4.53.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually wait. We should not force update transformers yet, it will break Qwen2.5-Omni. cc @Isotr0py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that confirmed? If so we can user users to upgrade transformers manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the look of it, it seems that the blast radius is more than just on Qwen2.5-Omni. See vllm-project/vllm-ascend#1470, so I'll update this PR accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so we can user users to upgrade transformers manually.
There is a big refactoring at config and processor in Transformers v4.53, we need to ask users to upgrade transformers manually temporarily before we add compatibility for it.
DO you have plan to support multimodal version as well? |
Yea it's planned. Stay tuned! |
@ywang96 Awesome |
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me> Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me>
@ywang96 Is the multimodal version implemented yet? I don't see any PRs that may have implemented it. I was wondering if I could go data parallel with images. |
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
Test Plan
Test Result
(Optional) Documentation Update