Skip to content

Conversation

mgoin
Copy link
Member

@mgoin mgoin commented Jun 6, 2025

Purpose

Update to the latest stable release of FlashInfer. This is the first stable release with Blackwell support, so fairly important to solidify on. However there are not pre-built wheels yet. We can wait to see if wheels will be published, or build our own. @huydhn could you help me with this?

I updated the instructions in the dockerfile to match the new method for building AOT kernels, based on https://docs.flashinfer.ai/installation.html#install-from-source

Test Plan

Test Result

Signed-off-by: mgoin <mgoin64@gmail.com>
Copy link

github-actions bot commented Jun 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @mgoin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, authored by @mgoin, aims to update the FlashInfer library dependency to version 0.2.6. According to the description, this version is significant as it's the first stable release with Blackwell support. The changes primarily involve modifying the Dockerfile used for CI builds to fetch and build FlashInfer from the v0.2.6 tag on GitHub, moving away from a conditional installation that previously attempted to use a pre-built wheel for CUDA 12.8. The PR also updates the supported CUDA architecture list in the build process.

Highlights

  • FlashInfer Version Update: Updates the FlashInfer dependency to the latest stable release, v0.2.6, which includes initial support for Blackwell GPUs.
  • CI Build Logic Change: Modifies the docker/Dockerfile to consistently build FlashInfer from source using the v0.2.6 tag from the GitHub repository, removing the previous conditional logic that would attempt to use a pre-built wheel for CUDA 12.8.
  • CUDA Architecture Support: Updates the TORCH_CUDA_ARCH_LIST environment variable in the Dockerfile (line 264) to include 10.0+PTX, likely to enable support for newer architectures like Blackwell (SM 10.0).

Changelog

  • docker/Dockerfile
    • Commented out the conditional block (lines 261-263, 270) that checked for CUDA 12.8 and attempted to install a specific pre-built FlashInfer wheel.
    • Updated the TORCH_CUDA_ARCH_LIST environment variable (line 264) to include 10.0+PTX.
    • Modified the uv pip install command (line 269) to fetch FlashInfer directly from the v0.2.6 tag on GitHub instead of a previous commit hash.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the ci/build label Jun 6, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates FlashInfer to version 0.2.6, which is an important step for Blackwell support. The changes in the Dockerfile correctly target the new version and update the TORCH_CUDA_ARCH_LIST to include compute capability 10.0.

The PR description clearly explains the rationale for building from source (lack of pre-built wheels for v0.2.6 currently).

I have a couple of suggestions to improve the clarity and maintainability of the Dockerfile.

Summary of Findings

  • Outdated/Misleading Comment: A comment on line 260 regarding pre-built wheels for FlashInfer is outdated given the change to build v0.2.6 from source.
  • Clarity of Commented-Out Code: The block of previously active code for FlashInfer installation (lines 261-263, 270) is now commented out. Replacing it with a concise explanatory comment or removing it would improve Dockerfile readability.

Merge Readiness

The core change to update FlashInfer to v0.2.6 and enable Blackwell support is well-implemented. However, there are a few medium severity issues related to comments and commented-out code that affect the Dockerfile's clarity and maintainability.

I recommend addressing these suggestions to improve the codebase. As I am an AI, I am not authorized to approve pull requests. Please ensure these changes are reviewed and approved by a human maintainer before merging.

fi; \
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@21ea1d2545f74782b91eb8c08fd503ac4c0743fc" ; \
fi \
# FlashInfer already has a wheel for PyTorch 2.7.0 and CUDA 12.8. This is enough for CI use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment # FlashInfer already has a wheel for PyTorch 2.7.0 and CUDA 12.8. This is enough for CI use now precedes logic that unconditionally builds FlashInfer v0.2.6 from source (for non-arm64 platforms). This comment seems to refer to the previous state with FlashInfer v0.2.5, where a pre-built wheel was used for CUDA 12.8.

Could we update this comment to more accurately reflect the current strategy of building v0.2.6 from source? This would prevent potential confusion for future readers.

    # Building FlashInfer v0.2.6 from source as pre-built wheels for this version are not yet available.

Comment on lines 261 to 263
# if [[ "$CUDA_VERSION" == 12.8* ]]; then \
# uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \
# else \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of commented-out code previously handled the conditional installation of FlashInfer v0.2.5 (either via a pre-built wheel for CUDA 12.8 or by building from a specific commit for other versions). The PR description clarifies that v0.2.6 wheels are not yet available, hence the shift to building from the v0.2.6 tag.

To improve Dockerfile readability and reduce clutter, would it be better to replace these commented-out lines with a single, more concise comment explaining the current situation or a TODO for future wheel availability? For example:

-    # if [[ "$CUDA_VERSION" == 12.8* ]]; then \
-    #     uv pip install --system https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl; \
-    # else \
+    # TODO: Re-evaluate using pre-built wheels for FlashInfer v0.2.6 if/when they become available.

This would make the Dockerfile's intent clearer.

    # TODO: Re-evaluate using pre-built wheels for FlashInfer v0.2.6 if/when they become available.

export FLASHINFER_ENABLE_SM90=0; \
fi; \
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6" ; \
# fi \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This commented-out # fi corresponds to the if block (lines 261-263) that is also now commented out.

If the preceding commented block (lines 261-263) is removed or replaced by a more concise comment as suggested, this line should also be removed to maintain consistency and clarity in the Dockerfile.

@huydhn
Copy link
Contributor

huydhn commented Jun 7, 2025

Yes, I can help build and publish that wheel on download.pytorch.org

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 9, 2025
@mgoin mgoin changed the title [CI] Update FlashInfer to 0.2.6 [CI] Update FlashInfer to 0.2.6.post1 Jun 9, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
@huydhn
Copy link
Contributor

huydhn commented Jun 10, 2025

Sorry for the delay, I have the wheel buit for 0.2.6.post1 ready at https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl. The wheel is built with FLASHINFER_LOCAL_VERSION=cu128torch2.7 TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0a 10.0a' FLASHINFER_ENABLE_AOT=1 python -m build --no-isolation --wheel to match what you have in the PR. Let me know if it works for you

@mgoin
Copy link
Member Author

mgoin commented Jun 10, 2025

Thank you @huydhn ! Will update now

@davefojtik
Copy link

Can we please get official Flashinfer AOT wheels for the cu126torch2.7 combination too? It should be supported, right?

@houseroad
Copy link
Collaborator

Maybe @huydhn could take a look at CUDA12.6 + torch2.7 combination for flashinfer wheel.

@cyril23
Copy link

cyril23 commented Jun 18, 2025

This Pullrequest broke SM 120 Blackwell compability (RTX 50xx, RTX PRO).

You can't use -e VLLM_USE_FLASHINFER_SAMPLER=1 anymore (which is the default) and need to fall back to -e VLLM_USE_FLASHINFER_SAMPLER=0 which will give you less performance and this warning:

WARNING 06-18 08:55:01 [topk_topp_sampler.py:52] FlashInfer is available, but it is not enabled. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please set VLLM_USE_FLASHINFER_SAMPLER=1.

I did 2 builds, both with --build-arg torch_cuda_arch_list='12.0 (SM 120 compatible only) and pushed them to Docker hub:

  1. wurstdeploy/vllm:azure10thjunesolo120 which is based on the last commit of 10th June (da9b523) and which still uses the old FlashInfer version
git checkout -b 10thjune da9b523ce1fd5c27bfd18921ba0388bf2e8e4618
DOCKER_BUILDKIT=1 sudo docker build --build-arg max_jobs=64   --build-arg USE_SCCACHE=0 --build-arg GIT_REPO_CHECK=1   --build-arg CUDA_VERSION=12.8.1   --build-arg torch_cuda_arch_list='12.0'   --build-arg RUN_WHEEL_CHECK=false   --tag wurstdeploy/vllm:azure10thjunesolo120 --target vllm-openai   --progress plain -f docker/Dockerfile .

# this is still SM 120 compatible, you can run via
sudo docker run --runtime nvidia --gpus all     -v ~/.cache/huggingface:/root/.cache/huggingface     -p 8000:8000 \
  -e VLLM_USE_FLASHINFER_SAMPLER=1 \
  wurstdeploy/vllm:azure10thjunesolo120    --model Qwen/Qwen3-0.6B
  1. wurstdeploy/vllm:azure11thjunesolo120 which is based on the last commit of 11th June (42f52cc) and already includes your commit 497a91e and therefore the updated Flashinfer version
git checkout -b 11thjune 42f52cc95bf34a2e15f4cdbc8474503a9bcc970f
DOCKER_BUILDKIT=1 sudo docker build --build-arg max_jobs=64   --build-arg USE_SCCACHE=0 --build-arg GIT_REPO_CHECK=1   --build-arg CUDA_VERSION=12.8.1   --build-arg torch_cuda_arch_list='12.0'   --build-arg RUN_WHEEL_CHECK=false   --tag wurstdeploy/vllm:azure11thjunesolo120 --target vllm-openai   --progress plain -f docker/Dockerfile .

# this is not fully SM 120 compatible anymore:
sudo docker run --runtime nvidia --gpus all     -v ~/.cache/huggingface:/root/.cache/huggingface     -p 8000:8000 \
  -e VLLM_USE_FLASHINFER_SAMPLER=1 \
  wurstdeploy/vllm:azure11thjunesolo120    --model Qwen/Qwen3-0.6B

INFO 06-18 08:53:41 [monitor.py:34] torch.compile takes 18.01 s in total
/usr/local/lib/python3.12/dist-packages/torch/utils/cpp_extension.py:2356: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
Process EngineCore_0:
ERROR 06-18 08:53:41 [core.py:515] EngineCore failed to start.
ERROR 06-18 08:53:41 [core.py:515] Traceback (most recent call last):
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 506, in run_engine_core
ERROR 06-18 08:53:41 [core.py:515]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 06-18 08:53:41 [core.py:515]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 390, in __init__
ERROR 06-18 08:53:41 [core.py:515]     super().__init__(vllm_config, executor_class, log_stats,
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 83, in __init__
ERROR 06-18 08:53:41 [core.py:515]     self._initialize_kv_caches(vllm_config)
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 141, in _initialize_kv_caches
ERROR 06-18 08:53:41 [core.py:515]     available_gpu_memory = self.model_executor.determine_available_memory()
ERROR 06-18 08:53:41 [core.py:515]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory
ERROR 06-18 08:53:41 [core.py:515]     output = self.collective_rpc("determine_available_memory")
ERROR 06-18 08:53:41 [core.py:515]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 06-18 08:53:41 [core.py:515]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 06-18 08:53:41 [core.py:515]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2680, in run_method
ERROR 06-18 08:53:41 [core.py:515]     return func(*args, **kwargs)
ERROR 06-18 08:53:41 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 06-18 08:53:41 [core.py:515]     return func(*args, **kwargs)
ERROR 06-18 08:53:41 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 205, in determine_available_memory
ERROR 06-18 08:53:41 [core.py:515]     self.model_runner.profile_run()
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2015, in profile_run
ERROR 06-18 08:53:41 [core.py:515]     sampler_output = self._dummy_sampler_run(hidden_states)
ERROR 06-18 08:53:41 [core.py:515]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 06-18 08:53:41 [core.py:515]     return func(*args, **kwargs)
ERROR 06-18 08:53:41 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1913, in _dummy_sampler_run
ERROR 06-18 08:53:41 [core.py:515]     raise e
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1903, in _dummy_sampler_run
ERROR 06-18 08:53:41 [core.py:515]     sampler_output = self.sampler(logits=logits,
ERROR 06-18 08:53:41 [core.py:515]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 06-18 08:53:41 [core.py:515]     return self._call_impl(*args, **kwargs)
ERROR 06-18 08:53:41 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 06-18 08:53:41 [core.py:515]     return forward_call(*args, **kwargs)
ERROR 06-18 08:53:41 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 52, in forward
ERROR 06-18 08:53:41 [core.py:515]     sampled = self.sample(logits, sampling_metadata)
ERROR 06-18 08:53:41 [core.py:515]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/sampler.py", line 118, in sample
ERROR 06-18 08:53:41 [core.py:515]     random_sampled = self.topk_topp_sampler(
ERROR 06-18 08:53:41 [core.py:515]                      ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 06-18 08:53:41 [core.py:515]     return self._call_impl(*args, **kwargs)
ERROR 06-18 08:53:41 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 06-18 08:53:41 [core.py:515]     return forward_call(*args, **kwargs)
ERROR 06-18 08:53:41 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/ops/topk_topp_sampler.py", line 104, in forward_cuda
ERROR 06-18 08:53:41 [core.py:515]     return flashinfer_sample(logits, k, p, generators)
ERROR 06-18 08:53:41 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/sample/ops/topk_topp_sampler.py", line 290, in flashinfer_sample
ERROR 06-18 08:53:41 [core.py:515]     next_token_ids = flashinfer.sampling.top_k_top_p_sampling_from_logits(
ERROR 06-18 08:53:41 [core.py:515]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/flashinfer/sampling.py", line 901, in top_k_top_p_sampling_from_logits
ERROR 06-18 08:53:41 [core.py:515]     masked_logits = top_k_mask_logits(logits, top_k)
ERROR 06-18 08:53:41 [core.py:515]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/flashinfer/sampling.py", line 1221, in top_k_mask_logits
ERROR 06-18 08:53:41 [core.py:515]     return get_sampling_module().top_k_mask_logits(
ERROR 06-18 08:53:41 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/flashinfer/sampling.py", line 352, in top_k_mask_logits
ERROR 06-18 08:53:41 [core.py:515]     module.top_k_mask_logits.default(
ERROR 06-18 08:53:41 [core.py:515]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 756, in __call__
ERROR 06-18 08:53:41 [core.py:515]     return self._op(*args, **kwargs)
ERROR 06-18 08:53:41 [core.py:515]            ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-18 08:53:41 [core.py:515] RuntimeError: TopKMaskLogits failed with error code no kernel image is available for execution on the device


# you can only run it without Flashinfer, i.e. -e VLLM_USE_FLASHINFER_SAMPLER=0:
sudo docker run --runtime nvidia --gpus all     -v ~/.cache/huggingface:/root/.cache/huggingface     -p 8000:8000 \
  -e VLLM_USE_FLASHINFER_SAMPLER=0 \
  wurstdeploy/vllm:azure11thjunesolo120    --model Qwen/Qwen3-0.6B
> WARNING 06-18 08:55:01 [topk_topp_sampler.py:52] FlashInfer is available, but it is not enabled. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please set VLLM_USE_FLASHINFER_SAMPLER=1.

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: minpeter <kali2005611@gmail.com>
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025
Signed-off-by: mgoin <mgoin64@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants