fix custom allreduce performance/accuracy problem #4477

yizhang2077 · 2025-03-16T15:26:57Z

Motivation

Fix accuracy/performance problem for custom allreduce, see #4441

Modifications

change the thread_per_block/block_per_grid calculation style
change block barrier/sync place as vllm does

performance

python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --disable-radix-cache --tp 8
python3 -m sglang.bench_serving --backend sglang --dataset-name sharegpt --num-prompts 5000

custom allreduce

python3 -m sglang.bench_serving --backend sglang --dataset-name sharegpt --num-prompts 5000
============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    inf
Max reqeuest concurrency:                not set
Successful requests:                     5000
Benchmark duration (s):                  49.19
Total input tokens:                      1553911
Total generated tokens:                  944949
Total generated tokens (retokenized):    944635
Request throughput (req/s):              101.65
Input token throughput (tok/s):          31591.81
Output token throughput (tok/s):         19211.30
Total token throughput (tok/s):          50803.11
Concurrency:                             3143.75

vllm allreduce

Backend:                                 sglang
Traffic request rate:                    inf
Max reqeuest concurrency:                not set
Successful requests:                     5000
Benchmark duration (s):                  50.39
Total input tokens:                      1553911
Total generated tokens:                  944949 
Total generated tokens (retokenized):    944661
Request throughput (req/s):              99.23
Input token throughput (tok/s):          30839.09
Output token throughput (tok/s):         18753.56
Total token throughput (tok/s):          49592.65
Concurrency:                             3051.93

accuracy

unittest
test_verl_engine.py compare logprobs with huggingface, this test can pass locally
gsm8k/mmmlu

python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --disable-radix-cache --tp 8
# gsm8k
python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319
# mmlu
bash benchmark/mmlu/download_data.sh
python3 benchmark/mmlu/bench_sglang.py --nsub 100 --ntrain 5 --parallel 2000

custom allreduce

# env
export USE_VLLM_CUSTOM_ALLREDUCE=0
# gsm8k
Accuracy: 0.788
Invalid: 0.001
Latency: 23.048 s
Output throughput: 5825.797 token/s

# mmlu
subject: abstract_algebra, #q:100, acc: 0.320                                                                                                                                 subject: anatomy, #q:135, acc: 0.681                                                                                                                                          subject: astronomy, #q:152, acc: 0.770                                                                                                                                        subject: business_ethics, #q:100, acc: 0.740                                                                                                                                  subject: clinical_knowledge, #q:265, acc: 0.743                                                                                                                               subject: college_biology, #q:144, acc: 0.812                                                                                                                                  subject: college_chemistry, #q:100, acc: 0.470                                                                                                                                subject: college_computer_science, #q:100, acc: 0.580
subject: college_mathematics, #q:100, acc: 0.400
subject: college_medicine, #q:173, acc: 0.665
subject: college_physics, #q:102, acc: 0.441
subject: computer_security, #q:100, acc: 0.800
subject: conceptual_physics, #q:235, acc: 0.621
subject: econometrics, #q:114, acc: 0.518
subject: electrical_engineering, #q:145, acc: 0.703
subject: elementary_mathematics, #q:378, acc: 0.497
subject: formal_logic, #q:126, acc: 0.587
subject: global_facts, #q:100, acc: 0.340
subject: high_school_biology, #q:310, acc: 0.813
subject: high_school_chemistry, #q:203, acc: 0.626
subject: high_school_computer_science, #q:100, acc: 0.740
subject: high_school_european_history, #q:165, acc: 0.752
subject: high_school_geography, #q:198, acc: 0.848
subject: high_school_government_and_politics, #q:193, acc: 0.917
subject: high_school_macroeconomics, #q:390, acc: 0.685
subject: high_school_mathematics, #q:270, acc: 0.456
subject: high_school_microeconomics, #q:238, acc: 0.773
subject: high_school_physics, #q:151, acc: 0.457
subject: high_school_psychology, #q:545, acc: 0.864
subject: high_school_statistics, #q:216, acc: 0.588
subject: high_school_us_history, #q:204, acc: 0.833
subject: high_school_world_history, #q:237, acc: 0.852
subject: human_aging, #q:223, acc: 0.686
subject: human_sexuality, #q:131, acc: 0.794
subject: international_law, #q:121, acc: 0.835
subject: jurisprudence, #q:108, acc: 0.750
subject: logical_fallacies, #q:163, acc: 0.798
subject: machine_learning, #q:112, acc: 0.580
subject: management, #q:103, acc: 0.816
subject: marketing, #q:234, acc: 0.880
subject: medical_genetics, #q:100, acc: 0.820
subject: miscellaneous, #q:783, acc: 0.831
subject: moral_disputes, #q:346, acc: 0.757
subject: moral_scenarios, #q:895, acc: 0.555
subject: nutrition, #q:306, acc: 0.788
subject: philosophy, #q:311, acc: 0.717
subject: prehistory, #q:324, acc: 0.756
subject: professional_accounting, #q:282, acc: 0.521
subject: professional_law, #q:1534, acc: 0.510
subject: professional_medicine, #q:272, acc: 0.750
subject: professional_psychology, #q:612, acc: 0.721
subject: public_relations, #q:110, acc: 0.691
subject: security_studies, #q:245, acc: 0.727
subject: sociology, #q:201, acc: 0.856
subject: us_foreign_policy, #q:100, acc: 0.860
subject: virology, #q:166, acc: 0.506
subject: world_religions, #q:171, acc: 0.842
Total latency: 64.293
Average accuracy: 0.683

vllm allreduce

# env
export USE_VLLM_CUSTOM_ALLREDUCE=1
# gsm8k
Accuracy: 0.794
Invalid: 0.000
Latency: 23.143 s
Output throughput: 5781.865 token/s

# mmlu
subject: abstract_algebra, #q:100, acc: 0.350                                                                                                                                 subject: anatomy, #q:135, acc: 0.689                                                                                                                                          subject: astronomy, #q:152, acc: 0.770                                                                                                                                        subject: business_ethics, #q:100, acc: 0.750                                                                                                                                  subject: clinical_knowledge, #q:265, acc: 0.747                                                                                                                               subject: college_biology, #q:144, acc: 0.812                                                                                                                                  subject: college_chemistry, #q:100, acc: 0.480
subject: college_computer_science, #q:100, acc: 0.570
subject: college_mathematics, #q:100, acc: 0.420
subject: college_medicine, #q:173, acc: 0.665
subject: college_physics, #q:102, acc: 0.441
subject: computer_security, #q:100, acc: 0.810
subject: conceptual_physics, #q:235, acc: 0.626
subject: econometrics, #q:114, acc: 0.535
subject: electrical_engineering, #q:145, acc: 0.717
subject: elementary_mathematics, #q:378, acc: 0.503
subject: formal_logic, #q:126, acc: 0.556
subject: global_facts, #q:100, acc: 0.360
subject: high_school_biology, #q:310, acc: 0.810
subject: high_school_chemistry, #q:203, acc: 0.621
subject: high_school_computer_science, #q:100, acc: 0.750
subject: high_school_european_history, #q:165, acc: 0.745
subject: high_school_geography, #q:198, acc: 0.843
subject: high_school_government_and_politics, #q:193, acc: 0.917
subject: high_school_macroeconomics, #q:390, acc: 0.677
subject: high_school_mathematics, #q:270, acc: 0.463
subject: high_school_microeconomics, #q:238, acc: 0.773
subject: high_school_physics, #q:151, acc: 0.470
subject: high_school_psychology, #q:545, acc: 0.864
subject: high_school_statistics, #q:216, acc: 0.597
subject: high_school_us_history, #q:204, acc: 0.838
subject: high_school_world_history, #q:237, acc: 0.848
subject: human_aging, #q:223, acc: 0.686
subject: human_sexuality, #q:131, acc: 0.794
subject: international_law, #q:121, acc: 0.835
subject: jurisprudence, #q:108, acc: 0.750
subject: logical_fallacies, #q:163, acc: 0.798
subject: machine_learning, #q:112, acc: 0.580
subject: management, #q:103, acc: 0.816
subject: marketing, #q:234, acc: 0.872
subject: medical_genetics, #q:100, acc: 0.820
subject: miscellaneous, #q:783, acc: 0.833
subject: moral_disputes, #q:346, acc: 0.757
subject: moral_scenarios, #q:895, acc: 0.554
subject: nutrition, #q:306, acc: 0.794
subject: philosophy, #q:311, acc: 0.720
subject: prehistory, #q:324, acc: 0.756
subject: professional_accounting, #q:282, acc: 0.500
subject: professional_law, #q:1534, acc: 0.512
subject: professional_medicine, #q:272, acc: 0.739
subject: professional_psychology, #q:612, acc: 0.722
subject: public_relations, #q:110, acc: 0.691
subject: security_studies, #q:245, acc: 0.735
subject: sociology, #q:201, acc: 0.851
subject: us_foreign_policy, #q:100, acc: 0.860
subject: virology, #q:166, acc: 0.506
subject: world_religions, #q:171, acc: 0.842
Total latency: 64.397
Average accuracy: 0.684

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

yizhang2077 assigned yizhang2077 and zhyncs Mar 16, 2025

yizhang2077 requested review from zhyncs, ispobock, HandH1998, BBuf and merrymercy as code owners March 16, 2025 15:26

yizhang2077 removed their assignment Mar 16, 2025

fix custom allreduce performance/accuracy problem

514f331

yizhang2077 force-pushed the fix-allreduce branch from 27ca925 to 514f331 Compare March 16, 2025 16:00

zhyncs approved these changes Mar 16, 2025

View reviewed changes

zhyncs merged commit 25e1816 into main Mar 16, 2025
11 checks passed

zhyncs deleted the fix-allreduce branch March 16, 2025 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix custom allreduce performance/accuracy problem #4477

fix custom allreduce performance/accuracy problem #4477

Uh oh!

yizhang2077 commented Mar 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

fix custom allreduce performance/accuracy problem #4477

fix custom allreduce performance/accuracy problem #4477

Uh oh!

Conversation

yizhang2077 commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

performance

custom allreduce

vllm allreduce

accuracy

custom allreduce

vllm allreduce

Checklist

Uh oh!

Uh oh!

Uh oh!

yizhang2077 commented Mar 16, 2025 •

edited

Loading