Skip to content

Conversation

BruceXcluding
Copy link
Contributor

Motivation

Decode Performance boost (DSR1 +8%):
Before:

/sgl-workspace/sglang# RCCL_MSCCL_ENABLE=0 CK_MOE=1 python -m sglang.bench_one_batch --batch-size 64 --input 512 --output 32 --model deepseek-ai/DeepSeek-R1/ --tp 8 --trust-remote-code --quantization fp8
Benchmark ...
Prefill. latency: 2.02619 s, throughput:  16172.25 token/s
Decode.  latency: 0.03899 s, throughput:   1641.31 token/s
Decode.  latency: 0.03917 s, throughput:   1633.71 token/s
Decode.  latency: 0.03974 s, throughput:   1610.47 token/s
Decode.  latency: 0.04054 s, throughput:   1578.81 token/s
Decode.  latency: 0.04060 s, throughput:   1576.45 token/s
Decode.  median latency: 0.04177 s, median throughput:   1532.14 token/s
Total. latency:  3.311 s, throughput:  10514.95 token/s

After:

/sgl-workspace/sglang# RCCL_MSCCL_ENABLE=0 CK_MOE=1 AITER_BLOCK_GEMM=1 python -m sglang.bench_one_batch --batch-size 64 --input 512 --output 32 --model deepseek-ai/DeepSeek-R1/ --tp 8 --trust-remote-code --quantization fp8
Benchmark ...
Prefill. latency: 2.35484 s, throughput:  13915.15 token/s
Decode.  latency: 0.03659 s, throughput:   1749.00 token/s
Decode.  latency: 0.03680 s, throughput:   1739.00 token/s
Decode.  latency: 0.03729 s, throughput:   1716.12 token/s
Decode.  latency: 0.03764 s, throughput:   1700.34 token/s
Decode.  latency: 0.03790 s, throughput:   1688.75 token/s
Decode.  median latency: 0.03876 s, median throughput:   1651.03 token/s
Total. latency:  3.549 s, throughput:   9810.84 token/s

Modifications

AITER BLOCK GEMM (tuned for DSR1/V3 for tp 8):
AITER_BLOCK_GEMM=1 in command line or ENV to activate aiter/ck block gemm.

Checklist

@BruceXcluding
Copy link
Contributor Author

cc @HaiShaw

Copy link
Collaborator

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just use CK_MOE to trigger it, no need AITER_BLOCK_GEMM to avoid deep tree of choices and many ENVs.

@BruceXcluding
Copy link
Contributor Author

Let's just use CK_MOE to trigger it, no need AITER_BLOCK_GEMM to avoid deep tree of choices and many ENVs.

OK, modified.

@BruceXcluding BruceXcluding requested a review from HaiShaw March 5, 2025 05:50
Copy link
Collaborator

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@HaiShaw HaiShaw merged commit 5be8f1e into sgl-project:main Mar 5, 2025
32 of 34 checks passed
@HaiShaw HaiShaw removed the wip label Mar 5, 2025
aoshen524 pushed a commit to aoshen524/sglang that referenced this pull request Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants