Fix block wise fp8 torch compile #3232

ispobock · 2025-01-31T11:54:54Z

Motivation

Fix torch compile for block wise fp8 linear layer.

python3 -m sglang.bench_one_batch --batch-size 1 --input 128 --output 256 --model deepseek-ai/DeepSeek-V3  --trust-remote-code --tp 8

Prefill. latency: 1.82421 s, throughput:     70.17 token/s
Decode.  latency: 1.75760 s, throughput:      0.57 token/s
Decode.  latency: 0.02740 s, throughput:     36.49 token/s
Decode.  latency: 0.02703 s, throughput:     37.00 token/s
Decode.  latency: 0.02711 s, throughput:     36.88 token/s
Decode.  latency: 0.02711 s, throughput:     36.89 token/s
Decode.  median latency: 0.02711 s, median throughput:     36.88 token/s
Total. latency:  3.745 s, throughput:     36.32 token/s
Benchmark ...
Prefill. latency: 0.16921 s, throughput:    756.48 token/s
Decode.  latency: 0.02716 s, throughput:     36.81 token/s
Decode.  latency: 0.02713 s, throughput:     36.86 token/s
Decode.  latency: 0.02713 s, throughput:     36.86 token/s
Decode.  latency: 0.02714 s, throughput:     36.85 token/s
Decode.  latency: 0.02716 s, throughput:     36.82 token/s
Decode.  median latency: 0.02719 s, median throughput:     36.78 token/s
Total. latency:  7.107 s, throughput:     54.03 token/s


python3 -m sglang.bench_one_batch --batch-size 1 --input 128 --output 256 --model deepseek-ai/DeepSeek-V3  --trust-remote-code --tp 8 --enable-torch-compile --torch-compile-max-bs 1
Prefill. latency: 1.85489 s, throughput:     69.01 token/s
Decode.  latency: 0.34461 s, throughput:      2.90 token/s
Decode.  latency: 0.02107 s, throughput:     47.46 token/s
Decode.  latency: 0.02078 s, throughput:     48.13 token/s
Decode.  latency: 0.02073 s, throughput:     48.23 token/s
Decode.  latency: 0.02075 s, throughput:     48.20 token/s
Decode.  median latency: 0.02078 s, median throughput:     48.13 token/s
Total. latency:  2.325 s, throughput:     58.50 token/s
Benchmark ...
Prefill. latency: 0.17728 s, throughput:    722.03 token/s
Decode.  latency: 0.02077 s, throughput:     48.15 token/s
Decode.  latency: 0.02075 s, throughput:     48.19 token/s
Decode.  latency: 0.02075 s, throughput:     48.19 token/s
Decode.  latency: 0.02075 s, throughput:     48.19 token/s
Decode.  latency: 0.02074 s, throughput:     48.22 token/s
Decode.  median latency: 0.02092 s, median throughput:     47.81 token/s
Total. latency:  5.497 s, throughput:     69.86 token/s

ispobock · 2025-01-31T11:56:34Z

Accuracy:

python3 benchmark/gsm8k/bench_sglang.py --num-questions 200 --parallel 1

Accuracy: 0.950
Invalid: 0.000
Latency: 452.187 s
Output throughput: 43.175 token/s

Jasmine-up · 2025-02-27T02:58:10Z

I can't replicate your result of 50 tokens/s. Could you please tell me which machine you are using?

fix block wise fp8 torch compile

441e411

ispobock requested review from merrymercy, Ying1123 and zhyncs as code owners January 31, 2025 11:54

zhyncs approved these changes Jan 31, 2025

View reviewed changes

zhyncs merged commit c02e313 into sgl-project:main Jan 31, 2025
1 of 14 checks passed

zhyncs mentioned this pull request Jan 31, 2025

chore: bump v0.4.2.post1 #3233

Merged

4 tasks

slin1237 mentioned this pull request Feb 6, 2025

Feature/docs deepseek usage and add multi-node #3314

Merged

4 tasks

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025

Fix block wise fp8 torch compile (sgl-project#3232)

45ba65a

zhyncs mentioned this pull request Mar 10, 2025

linear support deepgemm #4199

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix block wise fp8 torch compile #3232

Fix block wise fp8 torch compile #3232

Uh oh!

ispobock commented Jan 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

ispobock commented Jan 31, 2025

Uh oh!

Jasmine-up commented Feb 27, 2025

Uh oh!

Uh oh!

Fix block wise fp8 torch compile #3232

Fix block wise fp8 torch compile #3232

Uh oh!

Conversation

ispobock commented Jan 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Uh oh!

Uh oh!

ispobock commented Jan 31, 2025

Uh oh!

Jasmine-up commented Feb 27, 2025

Uh oh!

Uh oh!

ispobock commented Jan 31, 2025 •

edited

Loading