feat: Enable vLLM cudagraphs #498

jiemingz · 2025-06-10T22:04:09Z

Addresses: !186

The generation throughput shows about ~3% speedup for llama8b on 4 nodes

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

examples/configs/grpo_math_1B.yaml

nemo_rl/models/generation/vllm.py

parthchadha · 2025-06-17T16:54:26Z

@jiemingz can you also add timing plot to the MR description showing benefits of enabling cuda graphs vs not.

SahilJain314 · 2025-06-27T20:55:35Z

Unit test failure here with the eager key missing: @jiemingz
E File "/opt/nemo-rl/nemo_rl/models/generation/vllm.py", line 336, in init
E enforce_eager=self.cfg["vllm_cfg"]["enforce_eager"],
E ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
E KeyError: 'enforce_eager'

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>

jiemingz force-pushed the jiemingz/vllm_cg branch from de91c35 to e84ff82 Compare June 11, 2025 14:52

jiemingz requested review from parthchadha and terrykong June 11, 2025 14:52

jiemingz changed the title ~~Draft: Enable vLLM cudagraphs~~ Enable vLLM cudagraphs Jun 11, 2025

jiemingz self-assigned this Jun 11, 2025

parthchadha reviewed Jun 11, 2025

View reviewed changes

examples/configs/grpo_math_1B.yaml Outdated Show resolved Hide resolved

jiemingz changed the title ~~Enable vLLM cudagraphs~~ feat: Enable vLLM cudagraphs Jun 13, 2025

parthchadha requested changes Jun 17, 2025

View reviewed changes

nemo_rl/models/generation/vllm.py Outdated Show resolved Hide resolved

parthchadha previously approved these changes Jun 17, 2025

View reviewed changes

jiemingz dismissed parthchadha’s stale review via 684bca1 June 25, 2025 18:04

SahilJain314 previously approved these changes Jun 26, 2025

View reviewed changes

parthchadha previously approved these changes Jun 26, 2025

View reviewed changes

parthchadha added this pull request to the merge queue Jun 26, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 27, 2025

jiemingz dismissed stale reviews from parthchadha and SahilJain314 via 3c08662 June 30, 2025 16:02

jiemingz force-pushed the jiemingz/vllm_cg branch 2 times, most recently from 3c08662 to 7bb9f3d Compare June 30, 2025 16:08

parthchadha previously approved these changes Jul 1, 2025

View reviewed changes

parthchadha added this pull request to the merge queue Jul 1, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 1, 2025

jiemingz dismissed parthchadha’s stale review via b6cfff8 July 1, 2025 23:47

parthchadha previously approved these changes Jul 2, 2025

View reviewed changes

jiemingz dismissed parthchadha’s stale review via b03a490 July 2, 2025 18:38

parthchadha previously approved these changes Jul 2, 2025

View reviewed changes

parthchadha added this pull request to the merge queue Jul 2, 2025

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jul 2, 2025

jiemingz dismissed parthchadha’s stale review via 28abaa2 July 2, 2025 20:01

parthchadha previously approved these changes Jul 2, 2025

View reviewed changes

parthchadha added this pull request to the merge queue Jul 2, 2025

github-merge-queue bot pushed a commit that referenced this pull request Jul 2, 2025

feat: Enable vLLM cudagraphs (#498)

b1ab244

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 2, 2025

jiemingz dismissed parthchadha’s stale review via 2610899 July 2, 2025 22:40

parthchadha previously approved these changes Jul 2, 2025

View reviewed changes

parthchadha added this pull request to the merge queue Jul 2, 2025

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jul 3, 2025

jiemingz and others added 3 commits July 3, 2025 06:25

vllm CG

d9e3f40

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

Update grpo-gemma3-27b-it-16n8g-fsdp2tp8sp-actckpt-long.yaml

1ac69d2

Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com>

add key

e545a48

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

jiemingz dismissed parthchadha’s stale review via e545a48 July 3, 2025 13:27

jiemingz force-pushed the jiemingz/vllm_cg branch from 2610899 to e545a48 Compare July 3, 2025 13:27

parthchadha approved these changes Jul 3, 2025

View reviewed changes

parthchadha added this pull request to the merge queue Jul 3, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 3, 2025

Merge branch 'main' into jiemingz/vllm_cg

de65448

parthchadha enabled auto-merge July 3, 2025 19:14

parthchadha added this pull request to the merge queue Jul 3, 2025

Merged via the queue into main with commit 6ca1588 Jul 3, 2025
13 of 14 checks passed

parthchadha deleted the jiemingz/vllm_cg branch July 3, 2025 21:22

KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025

feat: Enable vLLM cudagraphs (#498)

8a054a4

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by: Jimmy Zhang <133159885+jiemingz@users.noreply.github.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Enable vLLM cudagraphs #498

feat: Enable vLLM cudagraphs #498

Uh oh!

jiemingz commented Jun 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

parthchadha commented Jun 17, 2025

Uh oh!

Uh oh!

SahilJain314 commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

feat: Enable vLLM cudagraphs #498

feat: Enable vLLM cudagraphs #498

Uh oh!

Conversation

jiemingz commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

Uh oh!

Uh oh!

parthchadha commented Jun 17, 2025

Uh oh!

Uh oh!

SahilJain314 commented Jun 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jiemingz commented Jun 10, 2025 •

edited

Loading