Skip to content

Conversation

fzyzcjy
Copy link
Collaborator

@fzyzcjy fzyzcjy commented May 10, 2025

Motivation

test

PYTHONUNBUFFERED=1 SGLANG_TORCH_PROFILER_DIR=/host_home/temp_sglang_server2local python3 -m sglang.launch_server --model-path /dev/shm/DeepSeek-R1 --trust-remote-code --dist-init-addr 192.168.0.55:5757 --nnodes 2 --node-rank ${MY_NODE_RANK} --tp-size ${num_gpu} --dp-size ${num_gpu} --enable-dp-attention --mem-fraction-static 0.8 --chunked-prefill-size $((128*${num_gpu})) --max-running-requests $((${num_gpu}*128)) --context-length 4096 --disable-radix-cache --enable-deepep-moe --deepep-mode low_latency --cuda-graph-bs 128 --decode-log-interval 1

python3 -m sglang.bench_one_batch_server --model-path /dev/shm/DeepSeek-R1 --base-url http://localhost:30000 --batch-size 16 --input-len 1 --output-len 2048 --skip-warmup
  • baseline: 6 tok/s/gpu
  • PR: 29 tok/s/gpu

Modifications

Checklist

@fzyzcjy fzyzcjy marked this pull request as draft May 12, 2025 00:04
@fzyzcjy fzyzcjy force-pushed the feat/padding_moe branch from 8797942 to 3fecc76 Compare May 12, 2025 00:09
@fzyzcjy fzyzcjy marked this pull request as ready for review May 12, 2025 00:09
@zhyncs zhyncs merged commit 2716830 into sgl-project:main May 17, 2025
113 of 128 checks passed
@lambert0312
Copy link
Contributor

This pr will significantly reduce DeepSeek's inference performance (15%+). Need to look at the specific reasons.

@fzyzcjy
Copy link
Collaborator Author

fzyzcjy commented May 20, 2025

@lambert0312 Looks bad. Could you please show your commands, and would be great to have a profile. My first guess is that, we need to fuse it.

@lambert0312
Copy link
Contributor

@lambert0312 Looks bad. Could you please show your commands, and would be great to have a profile. My first guess is that, we need to fuse it.

@fzyzcjy I tried to modify it. You can see the PR I linked above. Thank you.

@fzyzcjy
Copy link
Collaborator Author

fzyzcjy commented May 21, 2025

Interesting, I thought this line already makes no extra kernels are executed.

image

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025
xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants