-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Closed
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
Accuracy on gsm8k dataset is decreased for EP MoE.
cc: @xiaobochen123
Reproduction
EP:
python3 -m sglang.launch_server --model-path neuralmagic/DeepSeek-Coder-V2-Instruct-FP8 --disable-radix-cache --trust-remote-code --tp 8 --enable-ep-moe --disable-cuda-graph
python3 benchmark/gsm8k/bench_sglang.py --num-questions 1400 --parallel 1400
Accuracy: 0.540
Invalid: 0.005
Latency: 205.758 s
Output throughput: 1017.681 token/s
TP:
python3 -m sglang.launch_server --model-path neuralmagic/DeepSeek-Coder-V2-Instruct-FP8 --disable-radix-cache --trust-remote-code --tp 8 --disable-cuda-graph
python3 benchmark/gsm8k/bench_sglang.py --num-questions 1400 --parallel 1400
Accuracy: 0.930
Invalid: 0.000
Latency: 196.344 s
Output throughput: 1011.191 token/s
Environment
- sglang: main branch (0.4.0.post1)
- torch: 2.5.1
- triton: 3.1.0
Metadata
Metadata
Assignees
Labels
No labels