-
-
Notifications
You must be signed in to change notification settings - Fork 10k
[Bugfix] Fix EAGLE vocab embedding for multimodal target model #19570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: qizixi <qizixi@meta.com>
Signed-off-by: qizixi <qizixi@meta.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @zixi-qi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request fixes a bug in the EAGLE speculative decoding implementation where, in the case of a multimodal target model, the EAGLE head was not correctly identifying the language model component for sharing vocabulary embeddings and the language model head. The changes ensure that the EAGLE head interacts with the language-specific part of the multimodal model for weight sharing, resolving the issue and improving compatibility with multimodal architectures.
Highlights
- Multimodal Target Model Compatibility: Ensured that when the EAGLE head shares weights (like embeddings and the LM head) with a multimodal target model, it correctly identifies and uses the language model component of the target model.
- Embedding Sharing Logic: Updated the logic for sharing the
embed_tokens
layer to compare shapes and assign the embedding layer from the language model part of the target model when dealing with multimodal models. - LM Head Sharing Logic: Updated the logic for sharing the
lm_head
layer to consistently use the language model part of the target model, simplifying the conditional logic. - Image Token Index Handling: Added logic to copy the
image_token_index
from the target model's config to the EAGLE model's config if the target model is multimodal and has this attribute.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request fixes an issue where the EAGLE head was not correctly loading vocabulary embeddings for multimodal target models. The changes ensure that the EAGLE head interacts with the language model component of a multimodal target model for shared embeddings and LM head. The solution seems sound and improves compatibility with multimodal models.
vllm/v1/spec_decode/eagle.py
Outdated
# handle multimodality | ||
if hasattr(target_model.config, "image_token_index"): | ||
self.model.config.image_token_index = ( | ||
target_model.config.image_token_index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting this logic into a separate helper function for better readability and maintainability.
def _copy_image_token_index(target_model: nn.Module, model_config: PretrainedConfig) -> None:
if hasattr(target_model.config, "image_token_index"):
model_config.image_token_index = target_model.config.image_token_index
if supports_multimodal(target_model):
_copy_image_token_index(target_model, self.model.config)
# share embed_tokens with the target model if needed | ||
if get_pp_group().world_size == 1 \ | ||
and self.model.model.embed_tokens.weight.shape \ | ||
== target_model.model.embed_tokens.weight.shape: | ||
== target_language_model.model.embed_tokens.weight.shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a check to ensure that target_language_model.model.embed_tokens
is not None before accessing its weight
attribute to avoid potential AttributeError
exceptions.
if target_language_model.model.embed_tokens and self.model.model.embed_tokens.weight.shape == target_language_model.model.embed_tokens.weight.shape:
@@ -349,12 +358,9 @@ def load_model(self, target_model: nn.Module) -> None: | |||
# some model definition do not define lm_head explicitly | |||
# and reuse embed_tokens for lm_head, e.g., CohereForCausalLM | |||
if self.vllm_config.speculative_config.method != "eagle3" and \ | |||
hasattr(target_model, "lm_head"): | |||
hasattr(target_language_model, "lm_head"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @benchislett |
vllm/v1/spec_decode/eagle.py
Outdated
@@ -329,16 +329,25 @@ def load_model(self, target_model: nn.Module) -> None: | |||
|
|||
self.attn_layer_names = list(draft_attn_layer_names) | |||
|
|||
if supports_multimodal(target_model): | |||
# handle multimodality | |||
if hasattr(target_model.config, "image_token_index"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it not sufficient to check "supports_multimodal"? If this is just double-checking that the target model indeed supports multimodal, shouldn't it be incorporated into the supports_multimodal
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense supports_multimodal should be sufficient, updated the PR and thanks for the quick review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@benchislett Thanks for reviewing this PR so quick! |
Signed-off-by: qizixi <qizixi@meta.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. If we can add some unittest, that will be better. (it's okay to add in the following PR though)
Please fix the linter as well? |
Actually, it's like an infra error, re-run it now. |
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
…project#19570) Signed-off-by: qizixi <qizixi@meta.com> Signed-off-by: minpeter <kali2005611@gmail.com>
…project#19570) Signed-off-by: qizixi <qizixi@meta.com>
…project#19570) Signed-off-by: qizixi <qizixi@meta.com>
…project#19570) Signed-off-by: qizixi <qizixi@meta.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
…project#19570) Signed-off-by: qizixi <qizixi@meta.com>
Purpose
In the case when eagle head needs to share vocab embedding from target model and target model has multimodal support: eagle head should load the vocab embedding from the language model of the target model.
Test Plan & Result
pytest -v tests/v1/spec_decode/test_eagle.py
pytest -v tests/v1/e2e/test_spec_decode.py
(Optional) Documentation Update