Add fused MOE config for Qwen3 30B A3B on B200 #19455

0xjunhao · 2025-06-11T00:36:10Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Improve the performance of Qwen3 30B A3B on B200.

The config is produced with python3 benchmarks/kernels/benchmark_moe.py --tune --model Qwen/Qwen3-30B-A3B-FP8 --dtype fp8_w8a8 -tp 1

Test Plan

I used vllm serve Qwen/Qwen3-30B-A3B-FP8 --served-model-name foo for starting the server and
python3 benchmarks/benchmark_serving.py --backend vllm --dataset-name sharegpt --dataset-path ../ShareGPT_V3_unfiltered_cleaned_split.json --model foo --base-url http://localhost:8000 --endpoint /v1/completions --tokenizer Qwen/Qwen3-30B-A3B-FP8 --num-prompts 2000 --max-concurrency 200 --request_rate 100 for starting the benchmark.

Test Result

Before: 30.33 req/s
After: 40.53 req/s

(Optional) Documentation Update

gemini-code-assist · 2025-06-11T00:36:14Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

github-actions · 2025-06-11T00:36:18Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Junhao Li <junhao@ubicloud.com>

houseroad

Thanks for submitting the PR. If you can add the concrete command to the test plan, that will be great.

gemini-code-assist · 2025-06-11T05:43:49Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

0xjunhao · 2025-06-11T13:45:59Z

Thanks for submitting the PR. If you can add the concrete command to the test plan, that will be great.

@houseroad Updated.

houseroad · 2025-06-11T13:48:11Z

Thanks!

Signed-off-by: Junhao Li <junhao@ubicloud.com> Signed-off-by: minpeter <kali2005611@gmail.com>

Signed-off-by: Junhao Li <junhao@ubicloud.com>

Signed-off-by: Junhao Li <junhao@ubicloud.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

Signed-off-by: Junhao Li <junhao@ubicloud.com>

Add fused MOE config for Qwen3 30B A3B on B200

90c2947

Signed-off-by: Junhao Li <junhao@ubicloud.com>

0xjunhao force-pushed the 0xjunhao-b200-moe branch from f380155 to 90c2947 Compare June 11, 2025 00:40

houseroad reviewed Jun 11, 2025

View reviewed changes

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 11, 2025

mgoin approved these changes Jun 11, 2025

View reviewed changes

DarkLight1337 merged commit 2d40665 into vllm-project:main Jun 11, 2025
75 checks passed

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

Add fused MOE config for Qwen3 30B A3B on B200 (vllm-project#19455)

d99c74d

Signed-off-by: Junhao Li <junhao@ubicloud.com> Signed-off-by: minpeter <kali2005611@gmail.com>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025

Add fused MOE config for Qwen3 30B A3B on B200 (vllm-project#19455)

fc5d80e

Signed-off-by: Junhao Li <junhao@ubicloud.com>

avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025

Add fused MOE config for Qwen3 30B A3B on B200 (vllm-project#19455)

f2bd1d5

Signed-off-by: Junhao Li <junhao@ubicloud.com> Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

0xjunhao mentioned this pull request Aug 9, 2025

[Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) #22131

Merged

4 tasks

zixuanzhang226 mentioned this pull request Aug 12, 2025

feat: add fused moe config for Qwen3-30B-A3B on B200 sgl-project/sglang#9087

Merged

4 tasks

googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025

Add fused MOE config for Qwen3 30B A3B on B200 (vllm-project#19455)

489af66

Signed-off-by: Junhao Li <junhao@ubicloud.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add fused MOE config for Qwen3 30B A3B on B200 #19455

Add fused MOE config for Qwen3 30B A3B on B200 #19455

Uh oh!

0xjunhao commented Jun 11, 2025 •

edited by houseroad

Loading

Uh oh!

gemini-code-assist bot commented Jun 11, 2025

Uh oh!

github-actions bot commented Jun 11, 2025

Uh oh!

houseroad left a comment

Uh oh!

Uh oh!

gemini-code-assist bot commented Jun 11, 2025

Uh oh!

0xjunhao commented Jun 11, 2025

Uh oh!

houseroad commented Jun 11, 2025

Uh oh!

Uh oh!

Uh oh!

Add fused MOE config for Qwen3 30B A3B on B200 #19455

Add fused MOE config for Qwen3 30B A3B on B200 #19455

Uh oh!

Conversation

0xjunhao commented Jun 11, 2025 • edited by houseroad Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot commented Jun 11, 2025

Uh oh!

github-actions bot commented Jun 11, 2025

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot commented Jun 11, 2025

Uh oh!

0xjunhao commented Jun 11, 2025

Uh oh!

houseroad commented Jun 11, 2025

Uh oh!

Uh oh!

0xjunhao commented Jun 11, 2025 •

edited by houseroad

Loading