Skip to content

[Bug] Server crash with dp + attn cutlass_mla when running dsr1 #8518

@lingjiew

Description

@lingjiew

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Hi Team,
I want to have some test on dsr1 with attention dp+attn cutlass mla. Prefill works fine but the server got crashed when entering into decode phase.
I got the error like
File "/sgl-workspace/sglang/python/sglang/srt/layers/attention/cutlass_mla_backend.py", line 274, in forward_decode o = cutlass_mla_decode( File "/usr/local/lib/python3.10/dist-packages/sgl_kernel/attention.py", line 95, in cutlass_mla_decode assert B_block_table == B_q AssertionError
Can someone take a look at this? Below is the full error log.

sglang_node0.log

Reproduction

python3 -m sglang.launch_server --tokenizer-path nvidia/DeepSeek-R1-0528-FP4 --trust-remote-code --enable-dp-attention --enable-dp-lm-head --disable-radix-cache --enable-flashinfer-cutlass-moe --enable-ep-moe --moe-dense-tp-size 1 --max-running-requests 2048 --chunked-prefill-size 16384 --mem-fraction-static 0.85 --disable-cuda-graph --cuda-graph-bs 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 --quantization modelopt_fp4 --attention-backend cutlass_mla --stream-interval 10 --model-path=nvidia/DeepSeek-R1-0528-FP4 --host 0.0.0.0 --port 8000 --tensor-parallel-size=8 --data-parallel-size=8

Environment

Latest main branch.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions