Skip to content

Conversation

yeqcharlotte
Copy link
Collaborator

@yeqcharlotte yeqcharlotte commented Jun 22, 2025

Purpose

I found the existing implementation over-complicated:

  • every module has to include cmd_init();
  • bench base subcommand has to keep track of a BENCHMARK_CMD_MODULES and iterate over it manually;
BENCHMARK_CMD_MODULES = [
    vllm.entrypoints.cli.benchmark.latency,
    vllm.entrypoints.cli.benchmark.serve,
    vllm.entrypoints.cli.benchmark.throughput,
]
  • bench command has to implement it's own subparser_init().

This PR directly iterates over subclasses of BenchmarkSubcommandBase, use the name, help, add_cli_args and cmd to initialize new parser.
If we'd like to add a new subcommand for bench, we only need to

  1. copy paste ~20 lines like in vllm/entrypoints/cli/benchmark/throughput.py
  2. add the new module to __init__.py.

There's no need to change vllm/entrypoints/cli/benchmark/main.py and it'll make the import cleaner.

Test Plan

  1. Check help message
vllm bench --help
vllm bench latency --help
vllm bench throughput --help
vllm bench serve --help
  1. Unit tests
pytest -vv tests/benchmarks
  1. Ran a few commands E2E
vllm bench latency --model Qwen/Qwen3-235B-A22B-FP8 --tensor-parallel-size 4 --trust-remote-code --max_num_batched_tokens 40000 --input_len 2000 --output_len 150 --gpu-memory-utilization 0.9

Test Result

  1. Help messages
➜ vllm bench --help

INFO 06-22 00:34:25 [__init__.py:244] Automatically detected platform cuda.
usage: vllm bench <bench_type> [options]

vLLM bench subcommand.

positional arguments:
  {latency,serve,throughput}
    latency             Benchmark the latency of a single batch of requests.
    serve               Benchmark the online serving throughput.
    throughput          Benchmark offline inference throughput.

options:
  -h, --help            show this help message and exit

serve: https://gist.github.com/yeqcharlotte/e82fce865397fa985606fd9ff452a89b
latency: https://gist.github.com/yeqcharlotte/1ab09fb1371eb3bde62e67946c5cabe1
throughput: https://gist.github.com/yeqcharlotte/d3037681ac2b8649b54e77e497f771ba
2. Unit tests

tests/benchmarks/test_latency_cli.py::test_bench_latency PASSED                                                     [ 33%]
tests/benchmarks/test_serve_cli.py::test_bench_serve PASSED                                                         [ 66%]
tests/benchmarks/test_throughput_cli.py::test_bench_throughput PASSED                                               [100%]
  1. E2E latency benchmark runs:
    it still functions
Warmup iterations: 100%|███████████████████████████████████████████████████████████████████| 10/10 [00:39<00:00,  3.97s/it]
Profiling iterations: 100%|████████████████████████████████████████████████████████████████| 30/30 [01:58<00:00,  3.96s/it]
Avg latency: 3.9597734543960543 seconds
10% percentile latency: 3.9077892663422973 seconds
25% percentile latency: 3.937694176507648 seconds
50% percentile latency: 3.9521696044830605 seconds
75% percentile latency: 3.9949043814558536 seconds
90% percentile latency: 4.012286452157423 seconds
99% percentile latency: 4.026482834450435 seconds

(Optional) Documentation Update

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
@yeqcharlotte yeqcharlotte requested a review from aarnphm as a code owner June 22, 2025 07:51
@yeqcharlotte yeqcharlotte requested review from mgoin and ywang96 and removed request for aarnphm and mgoin June 22, 2025 07:51
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@yeqcharlotte yeqcharlotte requested a review from mgoin June 22, 2025 07:51
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @yeqcharlotte, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the implementation of the vllm bench CLI subcommands. The primary goal is to simplify the existing command-line interface structure, making it more maintainable and extensible. By transitioning from a manual, module-based registration system to a dynamic, class-based discovery mechanism, the changes reduce boilerplate code and streamline the process of adding new benchmark functionalities in the future.

Highlights

  • Refactor CLI Subcommand Registration: The vllm bench CLI now dynamically discovers its subcommands (latency, serve, throughput) by inspecting subclasses of BenchmarkSubcommandBase. This eliminates the need for a manually maintained list (BENCHMARK_CMD_MODULES) and explicit cmd_init() functions in individual subcommand modules.
  • Simplified Subcommand Definitions: Benchmark subcommand classes (BenchmarkLatencySubcommand, BenchmarkServingSubcommand, BenchmarkThroughputSubcommand) are streamlined. Their name and help attributes are now class-level properties, and add_cli_args is a class method, reducing boilerplate __init__ methods and @property decorators.
  • Centralized Subparser Initialization: The logic for initializing argparse subparsers for benchmark commands has been moved from individual subcommand classes to vllm/entrypoints/cli/benchmark/main.py, making the BenchmarkSubcommandBase cleaner and reducing duplication across subcommand implementations.
  • Improved Extensibility: Adding new benchmark subcommands is now simpler, requiring only the creation of a new class inheriting from BenchmarkSubcommandBase and adding it to vllm/entrypoints/cli/__init__.py, without needing to modify the main vllm/entrypoints/cli/benchmark/main.py file.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added frontend qwen Related to Qwen models labels Jun 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request simplifies the implementation of the vllm bench CLI subcommand by iterating over subclasses of BenchmarkSubcommandBase and using their attributes to initialize the parser. This approach reduces code duplication and makes it easier to add new subcommands.

Comment on lines +11 to 14
@classmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
"""Add the CLI arguments to the parser."""
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a @staticmethod decorator since this method doesn't use cls or instance attributes.

    @staticmethod
    def add_cli_args(parser: argparse.ArgumentParser) -> None:
        ""Add the CLI arguments to the parser.""
        raise NotImplementedError

Comment on lines +15 to 17
@classmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
add_cli_args(parser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a @staticmethod decorator since this method doesn't use cls or instance attributes.

Suggested change
@classmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
add_cli_args(parser)
@staticmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
add_cli_args(parser)

Comment on lines +15 to 17
@classmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
add_cli_args(parser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a @staticmethod decorator since this method doesn't use cls or instance attributes.

Suggested change
@classmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
add_cli_args(parser)
@staticmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
add_cli_args(parser)

Comment on lines +15 to 17
@classmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
add_cli_args(parser)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a @staticmethod decorator since this method doesn't use cls or instance attributes.

Suggested change
@classmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
add_cli_args(parser)
@staticmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
add_cli_args(parser)

Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks for the cleanup! Feel free to also address Gemini's reviews but I don't think they're a big deal.

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 22, 2025
@ywang96
Copy link
Member

ywang96 commented Jun 22, 2025

Hmm interestingly the bot added Qwen label when it shouldn't...

nvm - in your PR description you used Qwen as an example, so it's indeed working as expected. #19949 should make this better.

@aarnphm aarnphm removed the qwen Related to Qwen models label Jun 22, 2025
@aarnphm aarnphm merged commit 2c11a29 into vllm-project:main Jun 22, 2025
77 checks passed
juncheoll pushed a commit to juncheoll/vllm that referenced this pull request Jun 23, 2025
fhl2000 pushed a commit to fhl2000/vllm that referenced this pull request Jun 25, 2025
wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025
wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025
wwl2755-google pushed a commit to wwl2755-google/vllm that referenced this pull request Jul 1, 2025
avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025
…t#19948)

Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants