-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
[Bugfix] Fix handling of Tensorizer arguments for LoadConfig #20643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Fix handling of Tensorizer arguments for LoadConfig #20643
Conversation
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @sangstar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses a bug and improves the flexibility of how Tensorizer-specific arguments are handled during model loading. It streamlines the process of passing configuration parameters for Tensorizer, making it more intuitive for users to specify custom loading behaviors and resolving potential conflicts in argument precedence.
Highlights
- Improved Tensorizer Argument Handling: Refactored the logic within
LoadConfig
creation to correctly parse and apply Tensorizer-specific arguments provided via--model-loader-extra-config
. This ensures that additional parameters can be used to configure Tensorizer loading without requiring a specific JSON string format or inferring solely from the model path. - Centralized Tensorizer Configuration: Introduced a new
validate_tensorizer_args
method that moves all Tensorizer-specific arguments frommodel_loader_extra_config
into a dedicated nested dictionary,model_loader_extra_config["tensorizer_config"]
. This provides a cleaner and more explicit way to pass arguments to theTensorizerConfig
constructor. - Resolved Tensorizer Path Conflict: Modified
TensorizerConfig
to gracefully handle cases where bothtensorizer_dir
andtensorizer_uri
are provided. Instead of raising an error, it now logs a warning and prioritizestensorizer_uri
, inferringtensorizer_dir
from it. - Logging and Code Simplification: Removed a potentially misleading log message that would appear when
load_format
was not 'tensorizer' or when specific Tensorizer arguments were present. The overallcreate_load_config
logic for Tensorizer has been simplified and made more robust.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a good bugfix that correctly handles Tensorizer arguments for LoadConfig
. The changes address the incorrect logging issue and make the configuration more flexible and intuitive for users. I've provided a couple of suggestions to improve code clarity by avoiding data duplication and to fix a typo in a log message. Overall, the changes are solid and well-implemented.
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Tagged a few reviewers for visibility although this is a pretty small change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable, thanks for ffixing!
The Tensorizer test is failing, PTAL |
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
This test failure is due to an authentication issue on trying to pull the tensors from the bucket in the test and not related to any Tensorizer functionality issues itself. I went ahead and removed the test, and am currently working on a PR to improve the current Tensorizer tests we have. |
…oject#20643) Signed-off-by: Sanger Steel <sangersteel@gmail.com>
…oject#20643) Signed-off-by: Sanger Steel <sangersteel@gmail.com> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
…oject#20643) Signed-off-by: Sanger Steel <sangersteel@gmail.com>
…oject#20643) Signed-off-by: Sanger Steel <sangersteel@gmail.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
…oject#20643) Signed-off-by: Sanger Steel <sangersteel@gmail.com>
…oject#20643) Signed-off-by: Sanger Steel <sangersteel@gmail.com>
…oject#20643) Signed-off-by: Sanger Steel <sangersteel@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
…oject#20643) Signed-off-by: Sanger Steel <sangersteel@gmail.com>
…oject#20643) Signed-off-by: Sanger Steel <sangersteel@gmail.com>
…oject#20643) Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Validate arguments for Tensorizer when loading
This quick bugfix fixes the incorrect logging of a statement whenever
load_format != tensorizer
here.Additionally, it ensures TensorizerConfig can be always instantiated non-intrusively from
LoadConfig.model_loader_extra_config
, and that additional parameters passed to--model-loader-extra-config
invllm serve
can further configure Tensorizer loading, rather than forcing the user to have Tensorizer arguments be inferred from a--model-loader-extra-config
JSON string, or just the<path>
invllm serve <path>
.