-
Notifications
You must be signed in to change notification settings - Fork 586
build: Fixes for vLLM Blackwell Builds #2020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: Fixes for vLLM Blackwell Builds #2020
Conversation
👋 Hi zaristei! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThe installation script for vllm has been updated to alter the way PyTorch and its related packages are installed for both ARM64 and AMD64 architectures. Version pinning for PyTorch on ARM64 now targets earlier nightly builds, removes fallback to stable versions, and enforces strict failure on install errors. For AMD64, explicit PyTorch version installation and the use of the Changes
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
container/deps/vllm/install_vllm.sh (1)
125-128
:--no-build-isolation
can leak host-level build deps – confirm you really need it
Disabling build isolation speeds things up but forces the build to rely on whatever happens to be in the invoking environment, which hurts reproducibility and hermeticity. Verify that vllm genuinely cannot build in an isolated env; if not, consider removing the flag or documenting the justification next to the command.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
container/deps/vllm/install_vllm.sh
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: krishung5
PR: ai-dynamo/dynamo#1388
File: examples/multimodal/utils/model.py:47-53
Timestamp: 2025-06-09T17:52:06.761Z
Learning: The current get_vision_embeddings_size() function in examples/multimodal/utils/model.py uses a hardcoded fallback of 4096 for hidden_size, which assumes all VLMs follow the LLaVA architecture pattern. This is problematic because different VLMs like Qwen2-VL, MiniCPM-V, and others have different hidden dimensions, making the fallback unreliable and potentially causing tensor shape mismatches.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (1)
container/deps/vllm/install_vllm.sh (1)
131-138
: Audit redundant Torch installation in install_vllm.shIt looks like we explicitly pin and install
torch==2.7.1
(plus torchvision/torchaudio) on lines 131–138, then immediately invokepython use_existing_torch.py
, which may itself install or override CUDA-/compute-specific wheels:
- container/deps/vllm/install_vllm.sh:130–138
Double-installing PyTorch can waste ~1 GB and risks mixing binaries if one step only partially succeeds.
Options to consider:
- Drop the hard-coded
uv pip install torch==2.7.1…
and letuse_existing_torch.py
handle both CPU and GPU wheel selection.- Or add a guard around
python use_existing_torch.py
to skip it when the pre-install already provided the correct wheels.Please review the logic inside
use_existing_torch.py
to confirm its behavior on GB200/B200 images and adjust the script accordingly.
82f659b
to
e45b3ab
Compare
Tested on GB200. Currently the pinned flashinfer cannot be pipped install due to a dependency conflict, but the latest stable for flashinfer seems to work. |
Overview:
Fix issues with blackwell support on both GB200 and B200 with latest build script.
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit