-
Notifications
You must be signed in to change notification settings - Fork 120
feat: Added sequence packing #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR! The general approach LGTM, but I do have one question. Have you verified that the attention mask is correct here? When I step through the forward method, it looks like we end up calling scaled dot product attention here, and from what I can see, |
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
…{Dict, List, Tuple} to primitive dict, list tuple Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Sahil Jain <sahilj@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com> wip Signed-off-by: Terry Kong <terryk@nvidia.com> fix it Signed-off-by: Terry Kong <terryk@nvidia.com> patthing fix Signed-off-by: Terry Kong <terryk@nvidia.com> wip Signed-off-by: Terry Kong <terryk@nvidia.com> doesn't look like i needed that Signed-off-by: Terry Kong <terryk@nvidia.com> fix Signed-off-by: Terry Kong <terryk@nvidia.com> revert stuff Signed-off-by: Terry Kong <terryk@nvidia.com> make it better Signed-off-by: Terry Kong <terryk@nvidia.com> go Signed-off-by: Terry Kong <terryk@nvidia.com> cleanup Signed-off-by: Terry Kong <terryk@nvidia.com> mix it up Signed-off-by: Terry Kong <terryk@nvidia.com> touch up Signed-off-by: Terry Kong <terryk@nvidia.com> clean Signed-off-by: Terry Kong <terryk@nvidia.com> better Signed-off-by: Terry Kong <terryk@nvidia.com> clean up Signed-off-by: Terry Kong <terryk@nvidia.com> add it in Signed-off-by: Terry Kong <terryk@nvidia.com> mcore extra Signed-off-by: Terry Kong <terryk@nvidia.com> instructions Signed-off-by: Terry Kong <terryk@nvidia.com> works Signed-off-by: Terry Kong <terryk@nvidia.com> revert to 3.10, 3.12 didn't seem necessary Signed-off-by: Terry Kong <terryk@nvidia.com> ci has to recursively clone Signed-off-by: Terry Kong <terryk@nvidia.com> bump build workflow Signed-off-by: Terry Kong <terryk@nvidia.com> add megatron.core import Signed-off-by: Terry Kong <terryk@nvidia.com> potential fix for unit test on CI Signed-off-by: Terry Kong <terryk@nvidia.com> fix the test Signed-off-by: Terry Kong <terryk@nvidia.com> this should fix test (it was a collision of namespace) Signed-off-by: Terry Kong <terryk@nvidia.com> remove fp8 from test Signed-off-by: Terry Kong <terryk@nvidia.com> add shallow Signed-off-by: Terry Kong <terryk@nvidia.com> fix base build Signed-off-by: Terry Kong <terryk@nvidia.com> fix instructions Signed-off-by: Terry Kong <terryk@nvidia.com> fix the messed up indenting Signed-off-by: Terry Kong <terryk@nvidia.com> fix Signed-off-by: Terry Kong <terryk@nvidia.com> try nesting Signed-off-by: Terry Kong <terryk@nvidia.com> okay, got it to work Signed-off-by: Terry Kong <terryk@nvidia.com> fix up the readme Signed-off-by: Terry Kong <terryk@nvidia.com> ok Signed-off-by: Terry Kong <terryk@nvidia.com> touchup Signed-off-by: Terry Kong <terryk@nvidia.com> add the lock file back Signed-off-by: Terry Kong <terryk@nvidia.com> got Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
88cefb4
to
e7e4038
Compare
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Convergence graphs and logs are pending some runs. |
aeb36e9
to
f2aa89b
Compare
@ahmadki |
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
f2aa89b
to
e078be1
Compare
resolved |
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
363c0bc
to
0d2f2c0
Compare
pad_packed_seq_to=pad_full_seq_to, | ||
) | ||
) | ||
input_ids = input_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SahilJain314 Do you mind taking another look at this code block.
input_ids = input_ids
doesn't make sense
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
929c9bb
to
344275c
Compare
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
closing since newer PR makes this one obsolete #704 |
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
Additional Information
convergence looks good between megatron+sequence packing (gray) and megatron without
Additional runs can be found in this wandb project.