Skip to content

Conversation

ahmadki
Copy link
Member

@ahmadki ahmadki commented Apr 30, 2025

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

image
convergence looks good between megatron+sequence packing (gray) and megatron without

Additional runs can be found in this wandb project.

@terrykong terrykong requested review from ashors1 and jiemingz May 2, 2025 23:37
@terrykong
Copy link
Contributor

@ashors1 @jiemingz to do an initial review

@ashors1
Copy link
Contributor

ashors1 commented May 3, 2025

Thanks for the PR! The general approach LGTM, but I do have one question. Have you verified that the attention mask is correct here? When I step through the forward method, it looks like we end up calling scaled dot product attention here, and from what I can see, is_causal is True and causal_mask is None, so I wonder if the position_ids are actually being respected. It's possible I'm missing something here, so please let me know if that is the case!

SahilJain314 and others added 26 commits May 11, 2025 18:39
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
…{Dict, List, Tuple} to primitive dict, list tuple

Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>

wip

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix it

Signed-off-by: Terry Kong <terryk@nvidia.com>

patthing fix

Signed-off-by: Terry Kong <terryk@nvidia.com>

wip

Signed-off-by: Terry Kong <terryk@nvidia.com>

doesn't look like i needed that

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix

Signed-off-by: Terry Kong <terryk@nvidia.com>

revert stuff

Signed-off-by: Terry Kong <terryk@nvidia.com>

make it better

Signed-off-by: Terry Kong <terryk@nvidia.com>

go

Signed-off-by: Terry Kong <terryk@nvidia.com>

cleanup

Signed-off-by: Terry Kong <terryk@nvidia.com>

mix it up

Signed-off-by: Terry Kong <terryk@nvidia.com>

touch up

Signed-off-by: Terry Kong <terryk@nvidia.com>

clean

Signed-off-by: Terry Kong <terryk@nvidia.com>

better

Signed-off-by: Terry Kong <terryk@nvidia.com>

clean up

Signed-off-by: Terry Kong <terryk@nvidia.com>

add it in

Signed-off-by: Terry Kong <terryk@nvidia.com>

mcore extra

Signed-off-by: Terry Kong <terryk@nvidia.com>

instructions

Signed-off-by: Terry Kong <terryk@nvidia.com>

works

Signed-off-by: Terry Kong <terryk@nvidia.com>

revert to 3.10, 3.12 didn't seem necessary

Signed-off-by: Terry Kong <terryk@nvidia.com>

ci has to recursively clone

Signed-off-by: Terry Kong <terryk@nvidia.com>

bump build workflow

Signed-off-by: Terry Kong <terryk@nvidia.com>

add megatron.core import

Signed-off-by: Terry Kong <terryk@nvidia.com>

potential fix for unit test on CI

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix the test

Signed-off-by: Terry Kong <terryk@nvidia.com>

this should fix test (it was a collision of namespace)

Signed-off-by: Terry Kong <terryk@nvidia.com>

remove fp8 from test

Signed-off-by: Terry Kong <terryk@nvidia.com>

add shallow

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix base build

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix instructions

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix the messed up indenting

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix

Signed-off-by: Terry Kong <terryk@nvidia.com>

try nesting

Signed-off-by: Terry Kong <terryk@nvidia.com>

okay, got it to work

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix up the readme

Signed-off-by: Terry Kong <terryk@nvidia.com>

ok

Signed-off-by: Terry Kong <terryk@nvidia.com>

touchup

Signed-off-by: Terry Kong <terryk@nvidia.com>

add the lock file back

Signed-off-by: Terry Kong <terryk@nvidia.com>

got

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
@ahmadki ahmadki force-pushed the ahmadki/sequence_packing branch from 88cefb4 to e7e4038 Compare June 29, 2025 18:41
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
@ahmadki
Copy link
Member Author

ahmadki commented Jun 29, 2025

@SahilJain314

  1. I fixed the merge conflicts, specially with uv.lock
  2. Fixed commit signing
  3. Removed PackedDataset from the PR

Convergence graphs and logs are pending some runs.

@SahilJain314 SahilJain314 changed the title Added sequence packing feat: Added sequence packing Jun 30, 2025
@ahmadki
Copy link
Member Author

ahmadki commented Jul 1, 2025

Here you can check convergence graphs and run logs comparing main branch, f615a9c (seq packing enabled) and f615a9c (seq packing disabled) for the following configurations:

SFT:

  • 1B and 8b
  • dtensor and megatron

GRPO:

  • 1B and 8B
  • dtensor and megatron

@ahmadki ahmadki force-pushed the ahmadki/sequence_packing branch from aeb36e9 to f2aa89b Compare July 1, 2025 22:56
@eagle705
Copy link

eagle705 commented Jul 2, 2025

@ahmadki
would it be possible to check the DCO as well?

ahmadki and others added 5 commits July 2, 2025 05:40
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
@ahmadki ahmadki force-pushed the ahmadki/sequence_packing branch from f2aa89b to e078be1 Compare July 2, 2025 02:40
@ahmadki
Copy link
Member Author

ahmadki commented Jul 2, 2025

@ahmadki would it be possible to check the DCO as well?

resolved

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
@ahmadki ahmadki force-pushed the ahmadki/sequence_packing branch from 363c0bc to 0d2f2c0 Compare July 7, 2025 18:36
pad_packed_seq_to=pad_full_seq_to,
)
)
input_ids = input_ids
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SahilJain314 Do you mind taking another look at this code block.

input_ids = input_ids doesn't make sense

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
ahmadki added 3 commits July 14, 2025 23:39
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
@terrykong
Copy link
Contributor

closing since newer PR makes this one obsolete #704

@terrykong terrykong closed this Jul 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants