-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
[CI/Build] Add new CI job to validate Hybrid Models for every PR #20147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @tdoublep, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a new, mandatory CI job to validate vLLM's support for hybrid models (those mixing mamba and attention layers) on every pull request. This aims to significantly improve the reliability and verification of these increasingly important models, which were previously only tested in an optional and often flaky extended CI job.
Highlights
- New CI Job for Hybrid Models: A new
Language Models Test (Hybrid)
CI job has been added to the.buildkite/test-pipeline.yaml
configuration. This job is designed to run automatically for every pull request, specifically targeting tests for hybrid models that incorporate mamba layers. - Pytest Marker for Hybrid Models: A new
hybrid_model
marker has been defined inpyproject.toml
. This marker serves to categorize and select tests specifically related to models that contain mamba layers, facilitating their inclusion in the new dedicated CI job. - Test Categorization: Existing tests within
tests/models/language/generation/test_hybrid.py
have been updated by applying the new@pytest.mark.hybrid_model
decorator. This ensures these tests are properly identified and executed as part of the new hybrid model CI job. - CI Job Refinement: The
Language Models Test (Extended Generation)
job in.buildkite/test-pipeline.yaml
has been modified to explicitly exclude tests marked withhybrid_model
. This prevents redundant execution of these tests and helps maintain the efficiency and focus of the existing extended job, now that a dedicated hybrid model job is in place.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new CI job to specifically test hybrid models, which is a great addition for improving test coverage and speed. The changes to the Buildkite pipeline and pyproject.toml
are well-structured. I've provided a couple of suggestions to enhance maintainability and clarity in the test file and the marker definition.
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
…m-project#20147) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
…m-project#20147) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
…m-project#20147) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Hybrid models that mix mamba and attention layers (Bamba, Nemotron-H, Granite 4.0, Zamba2, Falcon H1) are becoming increasingly important to multiple different stakeholders. Currently, vLLM's support for these models is not verified as part of PR builds. The only way to test them in CI is to trigger the optional
Language Models Test (Extended Generation)
job. This job takes a long time (more than an hour) and contains a lot of flaky test (as one would expect for a job that is not run frequently on this codebase).This PR proposes to add a new CI stage that runs by default for every PR and verifies that vLLM works well for the family of hybrid models.
Hopefully this job will run when I open this PR and I add a comment re: how long it takes to run after I observe it.