Use MHA for DeepSeek FA3 backend no/short prefix case #5623

ispobock · 2025-04-22T08:02:42Z

Motivation

For no prefix or short prefix cases, it's compute bound. We should use MHA to reduce prefill computation.

Benchmark results on DeepSeek-Coder-V2-Lite-Instruct:

python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct --trust-remote-code --host 127.0.0.1 --disable-radix
python3 -m sglang.bench_one_batch_server --model None --base-url http://0.0.0.0:30000--batch-size 128 --input-len 1024 --output-len 1

batch size: 128
latency: 2.59 s
output throughput: 49.51 token/s
(input + output) throughput: 50750.40 token/s

batch size: 128
latency: 2.28 s
output throughput: 56.12 token/s
(input + output) throughput: 57522.51 token/s

zhyncs · 2025-04-22T08:07:46Z

python/sglang/srt/models/deepseek_v2.py

@@ -561,11 +561,6 @@ def __init__(
            "SGLANG_ROCM_FUSED_DECODE_MLA", "false"
        )

-        # TODO: Design a finer way to determine the threshold
-        self.chunked_prefix_cache_threshold = get_int_env_var(


chunked_prefix_cache_threshold is necessary. You can test isl 128,256,512,1024 osl 128,256,512,1024 bs 1,2,4,8,16,24,32 to verify.

I feel when there is no prefix, it makes sense to turn on MHA optimization.

ispobock added 2 commits April 22, 2025 04:06

update dispatch

9829caf

update

b043a6e

ispobock assigned Fridge003 Apr 22, 2025

ispobock requested review from merrymercy, Ying1123, hnyls2002, zhyncs and ByronHsu as code owners April 22, 2025 08:02

Merge branch 'main' into attn-dispatch

09109b3

zhyncs reviewed Apr 22, 2025

View reviewed changes

keep threshold

5fc8004

zhyncs mentioned this pull request Apr 22, 2025

Fix FA3 DeepSeek prefill performance regression #5624

Merged

6 tasks

ispobock closed this Apr 22, 2025

ispobock deleted the attn-dispatch branch April 22, 2025 09:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use MHA for DeepSeek FA3 backend no/short prefix case #5623

Use MHA for DeepSeek FA3 backend no/short prefix case #5623

Uh oh!

ispobock commented Apr 22, 2025

Uh oh!

zhyncs Apr 22, 2025

Uh oh!

Fridge003 Apr 22, 2025

Uh oh!

Uh oh!

Use MHA for DeepSeek FA3 backend no/short prefix case #5623

Use MHA for DeepSeek FA3 backend no/short prefix case #5623

Uh oh!

Conversation

ispobock commented Apr 22, 2025

Motivation

Uh oh!

zhyncs Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

Fridge003 Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!