[Bugfix][ROCm] Fix for warp_size uses on host #21205

gshtras · 2025-07-18T21:26:08Z

Fix for the regression added in #20330
Before that change certain configs relied on the undefined behavior on ROCm (using warpSize compiler builtin on host), and after they started crashing due to the value mismatch.

On ROCm the same compiled image can be used on different platforms with different warp sizes (Instinct, Radeon), therefore the WARP_SIZE that is used in the host code can't be made constexpr, but rather needs to be queried from the driver.

This PR addresses this issue, as well as fixing the paths to cuda_compat to be relative, to not include the un-hippified version in case hipify script is being used (ROCm)

Affected configs include:

VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 python benchmarks/benchmark_throughput.py --model amd/Llama-3.1-405B-Instruct-FP8-KV --dtype float16 --num-prompts 500 --input-len 2048 --output-len 2048 -tp 8 -O '{"pass_config":{"enable_attn_fusion":true,"enable_noop":true,"enable_fusion":true},"full_cuda_graph":true,"custom_ops":["+silu_and_mul"]}' --no-enable-prefix-caching

VLLM_V1_USE_PREFILL_DECODE_ATTENTION=1 python benchmarks/benchmark_latency.py --model mistralai/Mixtral-8x22B-v0.1 --dtype float16 --batch-size 16 --input-len 1024 --output-len 1024 -tp 8 -O '{"pass_config":{"enable_noop":true,"enable_fusion":true},"full_cuda_graph":true}' --num-iters-warmup 1 --num-iters 3

…st side that can be shared for multiple architectures with different values Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

github-actions · 2025-07-18T21:26:16Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

gemini-code-assist

Code Review

This pull request addresses a critical bug related to warp_size usage on ROCm by querying the warp size from the driver at runtime. The changes also include fixing paths to cuda_compat to be relative. A potential data race condition has been identified in the caching mechanism of the get_warp_size function, and a thread-safe solution has been proposed.

gemini-code-assist · 2025-07-18T21:27:19Z

csrc/cuda_compat.h

+  static __host__ int get_warp_size() {
+#if defined(USE_ROCM)
+    static bool is_cached = false;
+    static int result;
+
+    if (!is_cached) {
+      int device_id;
+      cudaDeviceProp deviceProp;
+      cudaGetDevice(&device_id);
+      cudaGetDeviceProperties(&deviceProp, device_id);
+
+      result = deviceProp.warpSize;
+      is_cached = true;
+    }
+
+    return result;
+#else
+    return 32;
+#endif
+  }


The current implementation of the get_warp_size function on the host side uses a static boolean is_cached and a static variable result to cache the warp size. This caching mechanism is susceptible to a data race if multiple threads call get_warp_size simultaneously before the value is cached. This can lead to incorrect warp size values or potential crashes. To fix this, use a thread-safe initialization of the static variable using a lambda to ensure that the warp size is initialized only once in a thread-safe manner.

static __host__ int get_warp_size() { #if defined(USE_ROCM) // C++11 guarantees that static local variable initialization is thread-safe. // This avoids a data race on is_cached and result. static const int result = [] { int device_id; cudaDeviceProp deviceProp; cudaGetDevice(&device_id); cudaGetDeviceProperties(&deviceProp, device_id); return deviceProp.warpSize; }(); return result; #else return 32; #endif }

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

SageMoore

Looks reasonable, @gshtras

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: 董巍 <dongwei@U-5XFVDYCF-2058.local>

Commits included: Using cuda_compat to defint the WARP_SIZE once Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> On ROCm toe constant compile time warp size can not be used on the host side that can be shared for multiple architectures with different values Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Formatting Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Refactor to use cuda_compat, and not the unhippified version Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Leaving CUDA side as just a simple define Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Commits included: Using cuda_compat to defint the WARP_SIZE once On ROCm toe constant compile time warp size can not be used on the host side that can be shared for multiple architectures with different values Formatting Refactor to use cuda_compat, and not the unhippified version Leaving CUDA side as just a simple define Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: shuw <shuw@nvidia.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: x22x22 <wadeking@qq.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Boyuan Feng <boyuan@meta.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

gshtras added 2 commits July 18, 2025 12:06

On ROCm toe constant compile time warp size can not be used on the ho…

db3ca7c

…st side that can be shared for multiple architectures with different values Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Formatting

4696ba5

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Refactor to use cuda_compat, and not the unhippified version

88827d2

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

gshtras force-pushed the host_side_warp_size branch from 3bf10f0 to 88827d2 Compare July 18, 2025 21:26

mergify bot added the rocm Related to AMD ROCm label Jul 18, 2025

gemini-code-assist bot reviewed Jul 18, 2025

View reviewed changes

Leaving CUDA side as just a simple define

31353f0

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

gshtras added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 22, 2025

Merge remote-tracking branch 'origin/main' into host_side_warp_size

6e866f6

SageMoore approved these changes Jul 24, 2025

View reviewed changes

DarkLight1337 approved these changes Jul 24, 2025

View reviewed changes

vllm-bot merged commit 90eeea8 into vllm-project:main Jul 24, 2025
96 of 97 checks passed

DW934 pushed a commit to DW934/vllm that referenced this pull request Jul 28, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

d2d007a

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: 董巍 <dongwei@U-5XFVDYCF-2058.local>

gshtras mentioned this pull request Jul 31, 2025

warpSize fix port to ROCm-7.0 ROCm/vllm#618

Merged

avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

cb06d6a

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

wenscarl pushed a commit to wenscarl/vllm that referenced this pull request Aug 4, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

a702ec7

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: shuw <shuw@nvidia.com>

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

e4fa7e2

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: x22x22 <wadeking@qq.com>

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

2f795e7

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

5ab144c

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

heyselbi pushed a commit to heyselbi/vllm that referenced this pull request Aug 8, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

a5caa42

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

heyselbi mentioned this pull request Aug 8, 2025

SYNC RHOAI 2.24 to midstream v0.10.0.1 release red-hat-data-services/vllm#175

Merged

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

3f2caf2

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

78534b7

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

taneem-ibrahim pushed a commit to taneem-ibrahim/vllm that referenced this pull request Aug 14, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

b9b9dee

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Aug 14, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

0869b0e

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Boyuan Feng <boyuan@meta.com>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

548a4ee

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

c3c5794

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025

[Bugfix][ROCm] Fix for warp_size uses on host (vllm-project#21205)

45fb2dc

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][ROCm] Fix for warp_size uses on host #21205

[Bugfix][ROCm] Fix for warp_size uses on host #21205

Uh oh!

gshtras commented Jul 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 18, 2025

Uh oh!

SageMoore left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Bugfix][ROCm] Fix for warp_size uses on host #21205

[Bugfix][ROCm] Fix for warp_size uses on host #21205

Uh oh!

Conversation

gshtras commented Jul 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

SageMoore left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gshtras commented Jul 18, 2025 •

edited by github-actions bot

Loading