[Bugfix] Fix EAGLE vocab embedding for multimodal target model #19570

zixi-qi · 2025-06-12T17:12:55Z

Purpose

In the case when eagle head needs to share vocab embedding from target model and target model has multimodal support: eagle head should load the vocab embedding from the language model of the target model.

Test Plan & Result

VLLM_USE_V1=1 python examples/offline_inference/eagle.py  --num_spec_tokens 7 --max_num_seqs 1 --num_prompts 5 --method eagle --enforce_eager

--------------------------------------------------
mean acceptance length: 2.62
--------------------------------------------------
acceptance at token 0:0.73
acceptance at token 1:0.44
acceptance at token 2:0.25
acceptance at token 3:0.13
acceptance at token 4:0.07
acceptance at token 5:0.01
acceptance at token 6:0.00

VLLM_USE_V1=1 python examples/offline_inference/eagle.py  --num_spec_tokens 7 --max_num_seqs

mean acceptance length: 4.15
--------------------------------------------------
acceptance at token 0:0.78
acceptance at token 1:0.63
acceptance at token 2:0.54
acceptance at token 3:0.37
acceptance at token 4:0.36
acceptance at token 5:0.27
acceptance at token 6:0.21

pytest -v tests/v1/spec_decode/test_eagle.py
pytest -v tests/v1/e2e/test_spec_decode.py

(Optional) Documentation Update

Signed-off-by: qizixi <qizixi@meta.com>

github-actions · 2025-06-12T17:13:03Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @zixi-qi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request fixes a bug in the EAGLE speculative decoding implementation where, in the case of a multimodal target model, the EAGLE head was not correctly identifying the language model component for sharing vocabulary embeddings and the language model head. The changes ensure that the EAGLE head interacts with the language-specific part of the multimodal model for weight sharing, resolving the issue and improving compatibility with multimodal architectures.

Highlights

Multimodal Target Model Compatibility: Ensured that when the EAGLE head shares weights (like embeddings and the LM head) with a multimodal target model, it correctly identifies and uses the language model component of the target model.
Embedding Sharing Logic: Updated the logic for sharing the embed_tokens layer to compare shapes and assign the embedding layer from the language model part of the target model when dealing with multimodal models.
LM Head Sharing Logic: Updated the logic for sharing the lm_head layer to consistently use the language model part of the target model, simplifying the conditional logic.
Image Token Index Handling: Added logic to copy the image_token_index from the target model's config to the EAGLE model's config if the target model is multimodal and has this attribute.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request fixes an issue where the EAGLE head was not correctly loading vocabulary embeddings for multimodal target models. The changes ensure that the EAGLE head interacts with the language model component of a multimodal target model for shared embeddings and LM head. The solution seems sound and improves compatibility with multimodal models.

gemini-code-assist · 2025-06-12T17:14:30Z

vllm/v1/spec_decode/eagle.py

+            # handle multimodality
+            if hasattr(target_model.config, "image_token_index"):
+                self.model.config.image_token_index = (
+                    target_model.config.image_token_index)


Consider extracting this logic into a separate helper function for better readability and maintainability.

def _copy_image_token_index(target_model: nn.Module, model_config: PretrainedConfig) -> None: if hasattr(target_model.config, "image_token_index"): model_config.image_token_index = target_model.config.image_token_index if supports_multimodal(target_model): _copy_image_token_index(target_model, self.model.config)

gemini-code-assist · 2025-06-12T17:14:30Z

vllm/v1/spec_decode/eagle.py

        # share embed_tokens with the target model if needed
        if get_pp_group().world_size == 1 \
            and self.model.model.embed_tokens.weight.shape \
-                == target_model.model.embed_tokens.weight.shape:
+                == target_language_model.model.embed_tokens.weight.shape:


Consider adding a check to ensure that target_language_model.model.embed_tokens is not None before accessing its weight attribute to avoid potential AttributeError exceptions.

if target_language_model.model.embed_tokens and self.model.model.embed_tokens.weight.shape == target_language_model.model.embed_tokens.weight.shape:

gemini-code-assist · 2025-06-12T17:14:30Z

vllm/v1/spec_decode/eagle.py

@@ -349,12 +358,9 @@ def load_model(self, target_model: nn.Module) -> None:
        # some model definition do not define lm_head explicitly
        # and reuse embed_tokens for lm_head, e.g., CohereForCausalLM
        if self.vllm_config.speculative_config.method != "eagle3" and \
-                hasattr(target_model, "lm_head"):
+                hasattr(target_language_model, "lm_head"):


Consider adding a check to ensure that target_language_model has the attribute lm_head and that it is not None before accessing it to avoid potential AttributeError exceptions.

if hasattr(target_language_model, "lm_head") and target_language_model.lm_head is not None:

WoosukKwon · 2025-06-12T17:33:03Z

cc @benchislett

benchislett · 2025-06-12T18:10:50Z

vllm/v1/spec_decode/eagle.py

@@ -329,16 +329,25 @@ def load_model(self, target_model: nn.Module) -> None:

        self.attn_layer_names = list(draft_attn_layer_names)

+        if supports_multimodal(target_model):
+            # handle multimodality
+            if hasattr(target_model.config, "image_token_index"):


why is it not sufficient to check "supports_multimodal"? If this is just double-checking that the target model indeed supports multimodal, shouldn't it be incorporated into the supports_multimodal function?

Makes sense supports_multimodal should be sufficient, updated the PR and thanks for the quick review!

benchislett

LGTM

WoosukKwon · 2025-06-12T18:18:30Z

@benchislett Thanks for reviewing this PR so quick!

Signed-off-by: qizixi <qizixi@meta.com>

houseroad

Looks good. If we can add some unittest, that will be better. (it's okay to add in the following PR though)

houseroad · 2025-06-12T23:39:35Z

Please fix the linter as well?

houseroad · 2025-06-12T23:44:21Z

Actually, it's like an infra error, re-run it now.

gemini-code-assist · 2025-06-13T03:09:22Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

…project#19570) Signed-off-by: qizixi <qizixi@meta.com> Signed-off-by: minpeter <kali2005611@gmail.com>

…project#19570) Signed-off-by: qizixi <qizixi@meta.com>

…project#19570) Signed-off-by: qizixi <qizixi@meta.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

…project#19570) Signed-off-by: qizixi <qizixi@meta.com>

zixi-qi and others added 3 commits June 11, 2025 23:01

small refactor for eagle base model with mm support

b561297

Signed-off-by: qizixi <qizixi@meta.com>

Merge branch 'vllm-project:main' into eagle-mm-refactor

89c3c1d

small refactor for eagle base model with mm support

da5af78

Signed-off-by: qizixi <qizixi@meta.com>

zixi-qi requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners June 12, 2025 17:12

gemini-code-assist bot reviewed Jun 12, 2025

View reviewed changes

mergify bot added the v1 label Jun 12, 2025

gemini-code-assist bot reviewed Jun 12, 2025

View reviewed changes

zixi-qi requested a review from morgendave June 12, 2025 17:20

benchislett reviewed Jun 12, 2025

View reviewed changes

benchislett approved these changes Jun 12, 2025

View reviewed changes

WoosukKwon added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 12, 2025

remove image_token_index check

2d62609

Signed-off-by: qizixi <qizixi@meta.com>

houseroad approved these changes Jun 12, 2025

View reviewed changes

aarnphm merged commit c68698b into vllm-project:main Jun 13, 2025
64 of 65 checks passed

zixi-qi deleted the eagle-mm-refactor branch June 13, 2025 05:55

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[Bugfix] Fix EAGLE vocab embedding for multimodal target model (vllm-…

b549e95

…project#19570) Signed-off-by: qizixi <qizixi@meta.com> Signed-off-by: minpeter <kali2005611@gmail.com>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025

[Bugfix] Fix EAGLE vocab embedding for multimodal target model (vllm-…

5791426

…project#19570) Signed-off-by: qizixi <qizixi@meta.com>

wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025

[Bugfix] Fix EAGLE vocab embedding for multimodal target model (vllm-…

83470d7

…project#19570) Signed-off-by: qizixi <qizixi@meta.com>

avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025

[Bugfix] Fix EAGLE vocab embedding for multimodal target model (vllm-…

93ec21e

…project#19570) Signed-off-by: qizixi <qizixi@meta.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025

[Bugfix] Fix EAGLE vocab embedding for multimodal target model (vllm-…

2fa90c4

…project#19570) Signed-off-by: qizixi <qizixi@meta.com>

Uh oh!

[Bugfix] Fix EAGLE vocab embedding for multimodal target model #19570

[Bugfix] Fix EAGLE vocab embedding for multimodal target model #19570

Uh oh!

Conversation

zixi-qi commented Jun 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan & Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

WoosukKwon commented Jun 12, 2025

Uh oh!

benchislett Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

zixi-qi Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

WoosukKwon commented Jun 12, 2025

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

houseroad commented Jun 12, 2025

Uh oh!

houseroad commented Jun 12, 2025

Uh oh!

Uh oh!

gemini-code-assist bot commented Jun 13, 2025

Uh oh!

Uh oh!

zixi-qi commented Jun 12, 2025 •

edited by github-actions bot

Loading