-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Labels
Description
I used a single-node H20-141G to deploy deepseek-R1, and used the vllm benchmark file benchmark_serving.py
to perform stress tests in different concurrent scenarios. I found that under the same parameter configuration, the input throughput performance deteriorated by 8% and the output throughput performance deteriorated by 12% after the version upgrade.
- sglang 0.4.3
python3 -m sglang.launch_server \
--model /home/model/DeepSeek-R1/ \
--tp 8 \
--trust-remote-code \
--enable-dp-attention \
--port "9001" \
--host 0.0.0.0 \
--enable-metrics
- sglang 0.4.5
python3 -m sglang.launch_server \
--model /home/model/DeepSeek-R1/ \
--tp 8 \
--dp 8 \
--trust-remote-code \
--enable-dp-attention \
--port "9001" \
--host 0.0.0.0 \
--enable-metrics
- performance test
python3 benchmark_serving.py \
--backend sglang \
--model $model \
--tokenizer 'deepseek-tokenizer' \
--dataset-name "random" \
--host $ip \
--port $port \
--random-input-len 1024 \
--random-output-len 1024 \
--ignore-eos \
--max-concurrency $concurrency \
--num-prompts $prompts \
--seed 12345 \
--trust-remote-code
- result
I don't understand what causes the performance degradation, can you explain it?