Skip to content

Conversation

yizhang2077
Copy link
Collaborator

@yizhang2077 yizhang2077 commented Mar 16, 2025

Motivation

Fix accuracy/performance problem for custom allreduce, see #4441

Modifications

  1. change the thread_per_block/block_per_grid calculation style
  2. change block barrier/sync place as vllm does

performance

python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --disable-radix-cache --tp 8
python3 -m sglang.bench_serving --backend sglang --dataset-name sharegpt --num-prompts 5000

custom allreduce

python3 -m sglang.bench_serving --backend sglang --dataset-name sharegpt --num-prompts 5000
============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    inf
Max reqeuest concurrency:                not set
Successful requests:                     5000
Benchmark duration (s):                  49.19
Total input tokens:                      1553911
Total generated tokens:                  944949
Total generated tokens (retokenized):    944635
Request throughput (req/s):              101.65
Input token throughput (tok/s):          31591.81
Output token throughput (tok/s):         19211.30
Total token throughput (tok/s):          50803.11
Concurrency:                             3143.75

vllm allreduce

Backend:                                 sglang
Traffic request rate:                    inf
Max reqeuest concurrency:                not set
Successful requests:                     5000
Benchmark duration (s):                  50.39
Total input tokens:                      1553911
Total generated tokens:                  944949 
Total generated tokens (retokenized):    944661
Request throughput (req/s):              99.23
Input token throughput (tok/s):          30839.09
Output token throughput (tok/s):         18753.56
Total token throughput (tok/s):          49592.65
Concurrency:                             3051.93

accuracy

  1. unittest
  2. test_verl_engine.py compare logprobs with huggingface, this test can pass locally
  3. gsm8k/mmmlu
python3 -m sglang.launch_server --model meta-llama/Llama-3.1-8B-Instruct --disable-radix-cache --tp 8
# gsm8k
python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 1319 --parallel 1319
# mmlu
bash benchmark/mmlu/download_data.sh
python3 benchmark/mmlu/bench_sglang.py --nsub 100 --ntrain 5 --parallel 2000

custom allreduce

# env
export USE_VLLM_CUSTOM_ALLREDUCE=0
# gsm8k
Accuracy: 0.788
Invalid: 0.001
Latency: 23.048 s
Output throughput: 5825.797 token/s

# mmlu
subject: abstract_algebra, #q:100, acc: 0.320                                                                                                                                 subject: anatomy, #q:135, acc: 0.681                                                                                                                                          subject: astronomy, #q:152, acc: 0.770                                                                                                                                        subject: business_ethics, #q:100, acc: 0.740                                                                                                                                  subject: clinical_knowledge, #q:265, acc: 0.743                                                                                                                               subject: college_biology, #q:144, acc: 0.812                                                                                                                                  subject: college_chemistry, #q:100, acc: 0.470                                                                                                                                subject: college_computer_science, #q:100, acc: 0.580
subject: college_mathematics, #q:100, acc: 0.400
subject: college_medicine, #q:173, acc: 0.665
subject: college_physics, #q:102, acc: 0.441
subject: computer_security, #q:100, acc: 0.800
subject: conceptual_physics, #q:235, acc: 0.621
subject: econometrics, #q:114, acc: 0.518
subject: electrical_engineering, #q:145, acc: 0.703
subject: elementary_mathematics, #q:378, acc: 0.497
subject: formal_logic, #q:126, acc: 0.587
subject: global_facts, #q:100, acc: 0.340
subject: high_school_biology, #q:310, acc: 0.813
subject: high_school_chemistry, #q:203, acc: 0.626
subject: high_school_computer_science, #q:100, acc: 0.740
subject: high_school_european_history, #q:165, acc: 0.752
subject: high_school_geography, #q:198, acc: 0.848
subject: high_school_government_and_politics, #q:193, acc: 0.917
subject: high_school_macroeconomics, #q:390, acc: 0.685
subject: high_school_mathematics, #q:270, acc: 0.456
subject: high_school_microeconomics, #q:238, acc: 0.773
subject: high_school_physics, #q:151, acc: 0.457
subject: high_school_psychology, #q:545, acc: 0.864
subject: high_school_statistics, #q:216, acc: 0.588
subject: high_school_us_history, #q:204, acc: 0.833
subject: high_school_world_history, #q:237, acc: 0.852
subject: human_aging, #q:223, acc: 0.686
subject: human_sexuality, #q:131, acc: 0.794
subject: international_law, #q:121, acc: 0.835
subject: jurisprudence, #q:108, acc: 0.750
subject: logical_fallacies, #q:163, acc: 0.798
subject: machine_learning, #q:112, acc: 0.580
subject: management, #q:103, acc: 0.816
subject: marketing, #q:234, acc: 0.880
subject: medical_genetics, #q:100, acc: 0.820
subject: miscellaneous, #q:783, acc: 0.831
subject: moral_disputes, #q:346, acc: 0.757
subject: moral_scenarios, #q:895, acc: 0.555
subject: nutrition, #q:306, acc: 0.788
subject: philosophy, #q:311, acc: 0.717
subject: prehistory, #q:324, acc: 0.756
subject: professional_accounting, #q:282, acc: 0.521
subject: professional_law, #q:1534, acc: 0.510
subject: professional_medicine, #q:272, acc: 0.750
subject: professional_psychology, #q:612, acc: 0.721
subject: public_relations, #q:110, acc: 0.691
subject: security_studies, #q:245, acc: 0.727
subject: sociology, #q:201, acc: 0.856
subject: us_foreign_policy, #q:100, acc: 0.860
subject: virology, #q:166, acc: 0.506
subject: world_religions, #q:171, acc: 0.842
Total latency: 64.293
Average accuracy: 0.683

vllm allreduce

# env
export USE_VLLM_CUSTOM_ALLREDUCE=1
# gsm8k
Accuracy: 0.794
Invalid: 0.000
Latency: 23.143 s
Output throughput: 5781.865 token/s

# mmlu
subject: abstract_algebra, #q:100, acc: 0.350                                                                                                                                 subject: anatomy, #q:135, acc: 0.689                                                                                                                                          subject: astronomy, #q:152, acc: 0.770                                                                                                                                        subject: business_ethics, #q:100, acc: 0.750                                                                                                                                  subject: clinical_knowledge, #q:265, acc: 0.747                                                                                                                               subject: college_biology, #q:144, acc: 0.812                                                                                                                                  subject: college_chemistry, #q:100, acc: 0.480
subject: college_computer_science, #q:100, acc: 0.570
subject: college_mathematics, #q:100, acc: 0.420
subject: college_medicine, #q:173, acc: 0.665
subject: college_physics, #q:102, acc: 0.441
subject: computer_security, #q:100, acc: 0.810
subject: conceptual_physics, #q:235, acc: 0.626
subject: econometrics, #q:114, acc: 0.535
subject: electrical_engineering, #q:145, acc: 0.717
subject: elementary_mathematics, #q:378, acc: 0.503
subject: formal_logic, #q:126, acc: 0.556
subject: global_facts, #q:100, acc: 0.360
subject: high_school_biology, #q:310, acc: 0.810
subject: high_school_chemistry, #q:203, acc: 0.621
subject: high_school_computer_science, #q:100, acc: 0.750
subject: high_school_european_history, #q:165, acc: 0.745
subject: high_school_geography, #q:198, acc: 0.843
subject: high_school_government_and_politics, #q:193, acc: 0.917
subject: high_school_macroeconomics, #q:390, acc: 0.677
subject: high_school_mathematics, #q:270, acc: 0.463
subject: high_school_microeconomics, #q:238, acc: 0.773
subject: high_school_physics, #q:151, acc: 0.470
subject: high_school_psychology, #q:545, acc: 0.864
subject: high_school_statistics, #q:216, acc: 0.597
subject: high_school_us_history, #q:204, acc: 0.838
subject: high_school_world_history, #q:237, acc: 0.848
subject: human_aging, #q:223, acc: 0.686
subject: human_sexuality, #q:131, acc: 0.794
subject: international_law, #q:121, acc: 0.835
subject: jurisprudence, #q:108, acc: 0.750
subject: logical_fallacies, #q:163, acc: 0.798
subject: machine_learning, #q:112, acc: 0.580
subject: management, #q:103, acc: 0.816
subject: marketing, #q:234, acc: 0.872
subject: medical_genetics, #q:100, acc: 0.820
subject: miscellaneous, #q:783, acc: 0.833
subject: moral_disputes, #q:346, acc: 0.757
subject: moral_scenarios, #q:895, acc: 0.554
subject: nutrition, #q:306, acc: 0.794
subject: philosophy, #q:311, acc: 0.720
subject: prehistory, #q:324, acc: 0.756
subject: professional_accounting, #q:282, acc: 0.500
subject: professional_law, #q:1534, acc: 0.512
subject: professional_medicine, #q:272, acc: 0.739
subject: professional_psychology, #q:612, acc: 0.722
subject: public_relations, #q:110, acc: 0.691
subject: security_studies, #q:245, acc: 0.735
subject: sociology, #q:201, acc: 0.851
subject: us_foreign_policy, #q:100, acc: 0.860
subject: virology, #q:166, acc: 0.506
subject: world_religions, #q:171, acc: 0.842
Total latency: 64.397
Average accuracy: 0.684

Checklist

@zhyncs zhyncs merged commit 25e1816 into main Mar 16, 2025
11 checks passed
@zhyncs zhyncs deleted the fix-allreduce branch March 16, 2025 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants