Skip to content

Conversation

yfw
Copy link
Contributor

@yfw yfw commented Jul 1, 2025

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

yfw and others added 15 commits June 25, 2025 12:44
Revert of
e01017a

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Run get_weights_for_ipc and get key map once for all

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Tag version that seems to be working

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Fix: use new weight param info every step

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Cleanup

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

bugfix

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Add more time logs

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyue/wip

fix aggregated all gather objects

fix aggregated all gather objects
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Delete self._held_gather_buffer

Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
@yfw yfw mentioned this pull request Jul 1, 2025
4 tasks
@yfw yfw marked this pull request as ready for review July 1, 2025 23:52
yfw added 4 commits July 1, 2025 17:19
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
@yfw yfw added the CI:L0 Run doctests and unit tests label Jul 2, 2025
yfw added 2 commits July 1, 2025 17:36
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
@yfw yfw removed the CI:L0 Run doctests and unit tests label Jul 2, 2025
yfw and others added 3 commits July 3, 2025 12:37
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
ashors1
ashors1 previously approved these changes Jul 3, 2025
@parthchadha parthchadha enabled auto-merge July 3, 2025 23:09
@parthchadha parthchadha added this pull request to the merge queue Jul 3, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 4, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
@yfw yfw added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Jul 7, 2025
terrykong
terrykong previously approved these changes Jul 8, 2025
@terrykong terrykong added this pull request to the merge queue Jul 8, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 8, 2025
@terrykong terrykong added this pull request to the merge queue Jul 9, 2025
Merged via the queue into main with commit a08829b Jul 9, 2025
13 of 14 checks passed
@terrykong terrykong deleted the yifu/megatron_ep branch July 9, 2025 06:47
RayenTian pushed a commit that referenced this pull request Jul 10, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
RayenTian pushed a commit that referenced this pull request Jul 10, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
RayenTian pushed a commit that referenced this pull request Jul 10, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
jialei777 pushed a commit to jialei777/nemo-rl that referenced this pull request Jul 23, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Jialei Chen <jialeic@google.com>
KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI:L0 Run doctests and unit tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants