feat: Added sequence packing #300

ahmadki · 2025-04-30T15:05:18Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

convergence looks good between megatron+sequence packing (gray) and megatron without

Additional runs can be found in this wandb project.

terrykong · 2025-05-02T23:38:13Z

@ashors1 @jiemingz to do an initial review

ashors1 · 2025-05-03T04:29:48Z

Thanks for the PR! The general approach LGTM, but I do have one question. Have you verified that the attention mask is correct here? When I step through the forward method, it looks like we end up calling scaled dot product attention here, and from what I can see, is_causal is True and causal_mask is None, so I wonder if the position_ids are actually being respected. It's possible I'm missing something here, so please let me know if that is the case!

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

…{Dict, List, Tuple} to primitive dict, list tuple Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com> wip Signed-off-by: Terry Kong <terryk@nvidia.com> fix it Signed-off-by: Terry Kong <terryk@nvidia.com> patthing fix Signed-off-by: Terry Kong <terryk@nvidia.com> wip Signed-off-by: Terry Kong <terryk@nvidia.com> doesn't look like i needed that Signed-off-by: Terry Kong <terryk@nvidia.com> fix Signed-off-by: Terry Kong <terryk@nvidia.com> revert stuff Signed-off-by: Terry Kong <terryk@nvidia.com> make it better Signed-off-by: Terry Kong <terryk@nvidia.com> go Signed-off-by: Terry Kong <terryk@nvidia.com> cleanup Signed-off-by: Terry Kong <terryk@nvidia.com> mix it up Signed-off-by: Terry Kong <terryk@nvidia.com> touch up Signed-off-by: Terry Kong <terryk@nvidia.com> clean Signed-off-by: Terry Kong <terryk@nvidia.com> better Signed-off-by: Terry Kong <terryk@nvidia.com> clean up Signed-off-by: Terry Kong <terryk@nvidia.com> add it in Signed-off-by: Terry Kong <terryk@nvidia.com> mcore extra Signed-off-by: Terry Kong <terryk@nvidia.com> instructions Signed-off-by: Terry Kong <terryk@nvidia.com> works Signed-off-by: Terry Kong <terryk@nvidia.com> revert to 3.10, 3.12 didn't seem necessary Signed-off-by: Terry Kong <terryk@nvidia.com> ci has to recursively clone Signed-off-by: Terry Kong <terryk@nvidia.com> bump build workflow Signed-off-by: Terry Kong <terryk@nvidia.com> add megatron.core import Signed-off-by: Terry Kong <terryk@nvidia.com> potential fix for unit test on CI Signed-off-by: Terry Kong <terryk@nvidia.com> fix the test Signed-off-by: Terry Kong <terryk@nvidia.com> this should fix test (it was a collision of namespace) Signed-off-by: Terry Kong <terryk@nvidia.com> remove fp8 from test Signed-off-by: Terry Kong <terryk@nvidia.com> add shallow Signed-off-by: Terry Kong <terryk@nvidia.com> fix base build Signed-off-by: Terry Kong <terryk@nvidia.com> fix instructions Signed-off-by: Terry Kong <terryk@nvidia.com> fix the messed up indenting Signed-off-by: Terry Kong <terryk@nvidia.com> fix Signed-off-by: Terry Kong <terryk@nvidia.com> try nesting Signed-off-by: Terry Kong <terryk@nvidia.com> okay, got it to work Signed-off-by: Terry Kong <terryk@nvidia.com> fix up the readme Signed-off-by: Terry Kong <terryk@nvidia.com> ok Signed-off-by: Terry Kong <terryk@nvidia.com> touchup Signed-off-by: Terry Kong <terryk@nvidia.com> add the lock file back Signed-off-by: Terry Kong <terryk@nvidia.com> got Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

ahmadki · 2025-06-29T18:57:27Z

@SahilJain314

I fixed the merge conflicts, specially with uv.lock
Fixed commit signing
Removed PackedDataset from the PR

Convergence graphs and logs are pending some runs.

nemo_rl/models/policy/dtensor_policy_worker.py

ahmadki · 2025-07-01T21:36:56Z

Here you can check convergence graphs and run logs comparing main branch, f615a9c (seq packing enabled) and f615a9c (seq packing disabled) for the following configurations:

SFT:

1B and 8b
dtensor and megatron

GRPO:

1B and 8B
dtensor and megatron

eagle705 · 2025-07-02T02:15:09Z

@ahmadki
would it be possible to check the DCO as well?

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

ahmadki · 2025-07-02T02:44:57Z

@ahmadki would it be possible to check the DCO as well?

resolved

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

ahmadki · 2025-07-07T18:38:56Z

nemo_rl/models/policy/megatron_policy_worker.py

+                        pad_packed_seq_to=pad_full_seq_to,
+                    )
+                )
+                input_ids = input_ids


@SahilJain314 Do you mind taking another look at this code block.

input_ids = input_ids doesn't make sense

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

terrykong · 2025-07-23T00:27:08Z

closing since newer PR makes this one obsolete #704

terrykong requested review from ashors1 and jiemingz May 2, 2025 23:37

SahilJain314 and others added 26 commits May 11, 2025 18:39

Fixed ~100 type errors

097de0d

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

fixed 50 more type issues

21afaf7

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

lint

f38ab81

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Added mypy config, fixed 50 more type errors, and updated old typing.…

e5296bc

…{Dict, List, Tuple} to primitive dict, list tuple Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Updated pyproject

a9cee8f

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Down to 100 errors

89cf9cf

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Update pyproject

e8160c3

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Down to 50 errors

6a494b9

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Added testing doc

8a95964

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Fixed missing import

3b0bfbc

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Fixed tokenizer type

588cfda

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Down to 50 errors

bc07e76

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

fixed 150 strict mypy errors

72e4819

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

lint

9bb8d83

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Fixed another 100 strict typing mypy errors

7376384

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Fixed another 100 strict typing mypy errors (down to 130)

b3f5098

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Brought non-strict errors down to 18

e8b3552

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Brought non-strict errors down further

7aa1f44

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Fixed pynvml test type

958f538

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

typo

17f5136

Signed-off-by: Terry Kong <terryk@nvidia.com>

all good

b17e163

Signed-off-by: Terry Kong <terryk@nvidia.com>

pin pre-commit

5b3b96d

Signed-off-by: Terry Kong <terryk@nvidia.com>

undo

48f8ffc

Signed-off-by: Terry Kong <terryk@nvidia.com>

move submodule stuff into comment until

8aa6ff8

Signed-off-by: Terry Kong <terryk@nvidia.com>

ok

c5f4305

Signed-off-by: Terry Kong <terryk@nvidia.com>

ahmadki force-pushed the ahmadki/sequence_packing branch from 88cefb4 to e7e4038 Compare June 29, 2025 18:41

Merge branch 'main' into ahmadki/sequence_packing

66bd9c7

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

SahilJain314 changed the title ~~Added sequence packing~~ feat: Added sequence packing Jun 30, 2025

parthchadha reviewed Jul 1, 2025

View reviewed changes

nemo_rl/models/policy/dtensor_policy_worker.py Outdated Show resolved Hide resolved

ahmadki force-pushed the ahmadki/sequence_packing branch from aeb36e9 to f2aa89b Compare July 1, 2025 22:56

ahmadki and others added 5 commits July 2, 2025 05:40

aligned NeMo git submodule with main

be32c8e

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

Merge branch 'main' into ahmadki/sequence_packing

d63e440

Merge branch 'main' into ahmadki/sequence_packing

28410f5

Lint fix

ae8f12f

Signed-off-by: Sahil Jain <sahilj@nvidia.com>

Load AutoModelForCausalLM weight in FP32

e078be1

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

ahmadki force-pushed the ahmadki/sequence_packing branch from f2aa89b to e078be1 Compare July 2, 2025 02:40

Merge branch 'main' into ahmadki/sequence_packing

0d2f2c0

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

ahmadki force-pushed the ahmadki/sequence_packing branch from 363c0bc to 0d2f2c0 Compare July 7, 2025 18:36

ahmadki commented Jul 7, 2025

View reviewed changes

ahmadki requested a review from SahilJain314 July 7, 2025 18:40

This was referenced Jul 8, 2025

feat: add flash-attn==2.7.4.post1 to backend dependencies #622

Merged

Draft: Feat: Enable torch compile #496

Open

Merge branch 'main' into ahmadki/sequence_packing

344275c

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

ahmadki force-pushed the ahmadki/sequence_packing branch from 929c9bb to 344275c Compare July 9, 2025 04:04

parthchadha mentioned this pull request Jul 10, 2025

Sequence packing + no dynamic batching failure on 8b #648

Open

ahmadki added 3 commits July 14, 2025 23:39

Merge branch 'main' into ahmadki/sequence_packing

43a401e

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

fix cp_size reference after merge with main

e1c22a1

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

added missing megatron_cfg to grpo_math config

29d34ba

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

SahilJain314 mentioned this pull request Jul 21, 2025

feat: Enable Context Parallelism and Sequence Packing for MCore and Dtensor #704

Merged

terrykong closed this Jul 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Added sequence packing #300

feat: Added sequence packing #300

Uh oh!

ahmadki commented Apr 30, 2025 •

edited

Loading

Uh oh!

terrykong commented May 2, 2025

Uh oh!

ashors1 commented May 3, 2025

Uh oh!

ahmadki commented Jun 29, 2025

Uh oh!

Uh oh!

ahmadki commented Jul 1, 2025

Uh oh!

eagle705 commented Jul 2, 2025

Uh oh!

ahmadki commented Jul 2, 2025

Uh oh!

ahmadki Jul 7, 2025

Uh oh!

terrykong commented Jul 23, 2025

Uh oh!

Uh oh!

feat: Added sequence packing #300

feat: Added sequence packing #300

Uh oh!

Conversation

ahmadki commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

terrykong commented May 2, 2025

Uh oh!

ashors1 commented May 3, 2025

Uh oh!

ahmadki commented Jun 29, 2025

Uh oh!

Uh oh!

ahmadki commented Jul 1, 2025

Uh oh!

eagle705 commented Jul 2, 2025

Uh oh!

ahmadki commented Jul 2, 2025

Uh oh!

ahmadki Jul 7, 2025

Choose a reason for hiding this comment

Uh oh!

terrykong commented Jul 23, 2025

Uh oh!

Uh oh!

ahmadki commented Apr 30, 2025 •

edited

Loading