Skip to content

"[Bug] The upstream ROCm container lmsysorg/sglang:v0.4.5-rocm630 encounters a runtime error related to Triton's per_token_group_quant_fp8." #5138

@japarada

Description

@japarada

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Following error occurs when trying to start server mode:

Error:
File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 1250, in fused_experts
torch.ops.sglang.inplace_fused_experts(
File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1122, in call
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 1095, in inplace_fused_experts
fused_experts_impl(
File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 1424, in fused_experts_impl
invoke_fused_moe_kernel(
File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 789, in invoke_fused_moe_kernel
A, A_scale = per_token_group_quant_fp8(A, block_k)
^^^^^^^^^^^^^^^^^^^^^^^^^
NameError: name 'per_token_group_quant_fp8' is not defined. Did you mean: 'per_token_group_quant_int8'?

[2025-04-07 19:18:42] Received sigquit from a child process. It usually means the child failed.
--- Logging error ---
[2025-04-07 19:18:42] Received sigquit from a child process. It usually means the child failed.
[2025-04-07 19:18:42] Received sigquit from a child process. It usually means the child failed.

Reproduction

python3 -m sglang.launch_server --model /deepseek/DeepSeek-R1 --tp 8 --trust-remote-code --chunked-prefill-size 131072 --enable-torch-compile --torch-compile-max-bs 256

Environment

lmsysorg/sglang:v0.4.5-rocm630

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions