[Misc] Simplify vllm bench cli subcommand implementation #19948

yeqcharlotte · 2025-06-22T07:51:28Z

Purpose

I found the existing implementation over-complicated:

every module has to include cmd_init();
bench base subcommand has to keep track of a BENCHMARK_CMD_MODULES and iterate over it manually;

BENCHMARK_CMD_MODULES = [
    vllm.entrypoints.cli.benchmark.latency,
    vllm.entrypoints.cli.benchmark.serve,
    vllm.entrypoints.cli.benchmark.throughput,
]

bench command has to implement it's own subparser_init().

This PR directly iterates over subclasses of BenchmarkSubcommandBase, use the name, help, add_cli_args and cmd to initialize new parser.
If we'd like to add a new subcommand for bench, we only need to

copy paste ~20 lines like in vllm/entrypoints/cli/benchmark/throughput.py
add the new module to __init__.py.

There's no need to change vllm/entrypoints/cli/benchmark/main.py and it'll make the import cleaner.

Test Plan

Check help message

vllm bench --help
vllm bench latency --help
vllm bench throughput --help
vllm bench serve --help

Unit tests

pytest -vv tests/benchmarks

Ran a few commands E2E

vllm bench latency --model Qwen/Qwen3-235B-A22B-FP8 --tensor-parallel-size 4 --trust-remote-code --max_num_batched_tokens 40000 --input_len 2000 --output_len 150 --gpu-memory-utilization 0.9

Test Result

Help messages

➜ vllm bench --help

INFO 06-22 00:34:25 [__init__.py:244] Automatically detected platform cuda.
usage: vllm bench <bench_type> [options]

vLLM bench subcommand.

positional arguments:
  {latency,serve,throughput}
    latency             Benchmark the latency of a single batch of requests.
    serve               Benchmark the online serving throughput.
    throughput          Benchmark offline inference throughput.

options:
  -h, --help            show this help message and exit

serve: https://gist.github.com/yeqcharlotte/e82fce865397fa985606fd9ff452a89b
latency: https://gist.github.com/yeqcharlotte/1ab09fb1371eb3bde62e67946c5cabe1
throughput: https://gist.github.com/yeqcharlotte/d3037681ac2b8649b54e77e497f771ba
2. Unit tests

tests/benchmarks/test_latency_cli.py::test_bench_latency PASSED                                                     [ 33%]
tests/benchmarks/test_serve_cli.py::test_bench_serve PASSED                                                         [ 66%]
tests/benchmarks/test_throughput_cli.py::test_bench_throughput PASSED                                               [100%]

E2E latency benchmark runs:
it still functions

Warmup iterations: 100%|███████████████████████████████████████████████████████████████████| 10/10 [00:39<00:00,  3.97s/it]
Profiling iterations: 100%|████████████████████████████████████████████████████████████████| 30/30 [01:58<00:00,  3.96s/it]
Avg latency: 3.9597734543960543 seconds
10% percentile latency: 3.9077892663422973 seconds
25% percentile latency: 3.937694176507648 seconds
50% percentile latency: 3.9521696044830605 seconds
75% percentile latency: 3.9949043814558536 seconds
90% percentile latency: 4.012286452157423 seconds
99% percentile latency: 4.026482834450435 seconds

(Optional) Documentation Update

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

github-actions · 2025-06-22T07:51:36Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @yeqcharlotte, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the implementation of the vllm bench CLI subcommands. The primary goal is to simplify the existing command-line interface structure, making it more maintainable and extensible. By transitioning from a manual, module-based registration system to a dynamic, class-based discovery mechanism, the changes reduce boilerplate code and streamline the process of adding new benchmark functionalities in the future.

Highlights

Refactor CLI Subcommand Registration: The vllm bench CLI now dynamically discovers its subcommands (latency, serve, throughput) by inspecting subclasses of BenchmarkSubcommandBase. This eliminates the need for a manually maintained list (BENCHMARK_CMD_MODULES) and explicit cmd_init() functions in individual subcommand modules.
Simplified Subcommand Definitions: Benchmark subcommand classes (BenchmarkLatencySubcommand, BenchmarkServingSubcommand, BenchmarkThroughputSubcommand) are streamlined. Their name and help attributes are now class-level properties, and add_cli_args is a class method, reducing boilerplate __init__ methods and @property decorators.
Centralized Subparser Initialization: The logic for initializing argparse subparsers for benchmark commands has been moved from individual subcommand classes to vllm/entrypoints/cli/benchmark/main.py, making the BenchmarkSubcommandBase cleaner and reducing duplication across subcommand implementations.
Improved Extensibility: Adding new benchmark subcommands is now simpler, requiring only the creation of a new class inheriting from BenchmarkSubcommandBase and adding it to vllm/entrypoints/cli/__init__.py, without needing to modify the main vllm/entrypoints/cli/benchmark/main.py file.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request simplifies the implementation of the vllm bench CLI subcommand by iterating over subclasses of BenchmarkSubcommandBase and using their attributes to initialize the parser. This approach reduces code duplication and makes it easier to add new subcommands.

gemini-code-assist · 2025-06-22T07:53:19Z

vllm/entrypoints/cli/benchmark/base.py

+    @classmethod
+    def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
        """Add the CLI arguments to the parser."""
        raise NotImplementedError


Consider adding a @staticmethod decorator since this method doesn't use cls or instance attributes.

@staticmethod def add_cli_args(parser: argparse.ArgumentParser) -> None: ""Add the CLI arguments to the parser."" raise NotImplementedError

gemini-code-assist · 2025-06-22T07:53:19Z

vllm/entrypoints/cli/benchmark/latency.py

+    @classmethod
+    def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
        add_cli_args(parser)


Consider adding a @staticmethod decorator since this method doesn't use cls or instance attributes.

Suggested change

@classmethod

def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:

add_cli_args(parser)

@staticmethod

def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:

add_cli_args(parser)

gemini-code-assist · 2025-06-22T07:53:19Z

vllm/entrypoints/cli/benchmark/serve.py

+    @classmethod
+    def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
        add_cli_args(parser)


Consider adding a @staticmethod decorator since this method doesn't use cls or instance attributes.

Suggested change

@classmethod

def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:

add_cli_args(parser)

@staticmethod

def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:

add_cli_args(parser)

gemini-code-assist · 2025-06-22T07:53:20Z

vllm/entrypoints/cli/benchmark/throughput.py

+    @classmethod
+    def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
        add_cli_args(parser)


Consider adding a @staticmethod decorator since this method doesn't use cls or instance attributes.

Suggested change

@classmethod

def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:

add_cli_args(parser)

@staticmethod

def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:

add_cli_args(parser)

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

ywang96

LGTM and thanks for the cleanup! Feel free to also address Gemini's reviews but I don't think they're a big deal.

ywang96 · 2025-06-22T08:26:43Z

~~Hmm interestingly the bot added Qwen label when it shouldn't...~~

nvm - in your PR description you used Qwen as an example, so it's indeed working as expected. #19949 should make this better.

…t#19948) Signed-off-by: juncheoll <th6re8e@naver.com>

…t#19948) Signed-off-by: fhl <2410591650@qq.com>

…t#19948)

…t#19948) Signed-off-by: Will Eaton <weaton@redhat.com>

…t#19948)

…t#19948) Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

…t#19948)

simplify bench cli

625a117

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

yeqcharlotte requested a review from aarnphm as a code owner June 22, 2025 07:51

yeqcharlotte requested review from mgoin and ywang96 and removed request for aarnphm and mgoin June 22, 2025 07:51

yeqcharlotte requested a review from mgoin June 22, 2025 07:51

gemini-code-assist bot reviewed Jun 22, 2025

View reviewed changes

mergify bot added frontend qwen Related to Qwen models labels Jun 22, 2025

gemini-code-assist bot reviewed Jun 22, 2025

View reviewed changes

precommit

215fd4c

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>

ywang96 approved these changes Jun 22, 2025

View reviewed changes

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 22, 2025

ywang96 mentioned this pull request Jun 22, 2025

[Misc] Update model-specific PR tagging #19949

Merged

4 tasks

aarnphm approved these changes Jun 22, 2025

View reviewed changes

aarnphm removed the qwen Related to Qwen models label Jun 22, 2025

aarnphm merged commit 2c11a29 into vllm-project:main Jun 22, 2025
77 checks passed

juncheoll pushed a commit to juncheoll/vllm that referenced this pull request Jun 23, 2025

[Misc] Simplify vllm bench cli subcommand implementation (vllm-projec…

29e3938

…t#19948) Signed-off-by: juncheoll <th6re8e@naver.com>

fhl2000 pushed a commit to fhl2000/vllm that referenced this pull request Jun 25, 2025

[Misc] Simplify vllm bench cli subcommand implementation (vllm-projec…

2362d8b

…t#19948) Signed-off-by: fhl <2410591650@qq.com>

gmarinho2 pushed a commit to gmarinho2/vllm that referenced this pull request Jun 26, 2025

[Misc] Simplify vllm bench cli subcommand implementation (vllm-projec…

de03864

…t#19948)

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025

[Misc] Simplify vllm bench cli subcommand implementation (vllm-projec…

bb1cff8

…t#19948)

wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025

[Misc] Simplify vllm bench cli subcommand implementation (vllm-projec…

82ddd22

…t#19948) Signed-off-by: Will Eaton <weaton@redhat.com>

wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025

[Misc] Simplify vllm bench cli subcommand implementation (vllm-projec…

e7de377

…t#19948)

wwl2755-google pushed a commit to wwl2755-google/vllm that referenced this pull request Jul 1, 2025

[Misc] Simplify vllm bench cli subcommand implementation (vllm-projec…

901ecf2

…t#19948)

avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025

[Misc] Simplify vllm bench cli subcommand implementation (vllm-projec…

3163091

…t#19948) Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025

[Misc] Simplify vllm bench cli subcommand implementation (vllm-projec…

7b6dfca

…t#19948)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc] Simplify vllm bench cli subcommand implementation #19948

[Misc] Simplify vllm bench cli subcommand implementation #19948

Uh oh!

yeqcharlotte commented Jun 22, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 22, 2025

Uh oh!

gemini-code-assist bot Jun 22, 2025

Uh oh!

gemini-code-assist bot Jun 22, 2025

Uh oh!

gemini-code-assist bot Jun 22, 2025

Uh oh!

ywang96 left a comment

Uh oh!

ywang96 commented Jun 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Misc] Simplify vllm bench cli subcommand implementation #19948

[Misc] Simplify vllm bench cli subcommand implementation #19948

Uh oh!

Conversation

yeqcharlotte commented Jun 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jun 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 22, 2025

Choose a reason for hiding this comment

Uh oh!

ywang96 left a comment

Choose a reason for hiding this comment

Uh oh!

ywang96 commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yeqcharlotte commented Jun 22, 2025 •

edited by github-actions bot

Loading

ywang96 commented Jun 22, 2025 •

edited

Loading