Skip to content

Conversation

akoumpa
Copy link
Member

@akoumpa akoumpa commented May 1, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Avoids race-condition among ranks.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

akoumpa added 2 commits May 1, 2025 03:20
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/automodel_prioritize_rank0 branch from 174b305 to dbe763a Compare May 1, 2025 10:22
akoumpa added 2 commits May 1, 2025 03:22
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/automodel_prioritize_rank0 branch from df19c61 to 178e025 Compare May 1, 2025 15:16
@akoumpa akoumpa changed the title add FirstRankPerNode [automodel] add FirstRankPerNode May 1, 2025
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/automodel_prioritize_rank0 branch from e3c206d to 2f6ac01 Compare May 1, 2025 15:19
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/automodel_prioritize_rank0 branch from 700fd9c to 31f1bb6 Compare May 1, 2025 15:39
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/automodel_prioritize_rank0 branch from 3caa56f to f2e1c05 Compare May 1, 2025 16:09
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/automodel_prioritize_rank0 branch from d3e25df to 93d5ac0 Compare May 1, 2025 16:12
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/automodel_prioritize_rank0 branch from 875bed6 to 41fe2a6 Compare May 1, 2025 16:16
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/automodel_prioritize_rank0 branch from d066b6b to ab4f889 Compare May 1, 2025 16:18
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/automodel_prioritize_rank0 branch from 556c1c9 to 7b66e44 Compare May 1, 2025 16:25
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/automodel_prioritize_rank0 branch from 1c91506 to 9865ded Compare May 1, 2025 16:35
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Copy link
Contributor

github-actions bot commented May 3, 2025

[🤖]: Hi @akoumpa 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

Due to a major CI change, merges are currently handled by the automation team.
We will reach out to you quickly to merge this PR, but you can always reach us with the following handles:

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

@akoumpa akoumpa marked this pull request as ready for review May 4, 2025 18:46
@akoumpa akoumpa requested a review from abhinavg4 May 4, 2025 18:47
Copy link
Collaborator

@abhinavg4 abhinavg4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks

@chtruong814 chtruong814 merged commit 1f511fd into main May 4, 2025
252 checks passed
@chtruong814 chtruong814 deleted the akoumparouli/automodel_prioritize_rank0 branch May 4, 2025 20:07
@chtruong814 chtruong814 added the r2.3.0 Pick this label for auto-cherrypicking into v2.3.0 label May 13, 2025
ko3n1g pushed a commit that referenced this pull request May 13, 2025
* add FirstRankPerNode

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cleanup

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cleanup

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cleanup

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* simplify

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* simployf

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix_import

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove unused option

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
chtruong814 pushed a commit that referenced this pull request May 15, 2025
…13559)

* [automodel] add FirstRankPerNode (#13373)

* add FirstRankPerNode

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cleanup

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cleanup

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* cleanup

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* simplify

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* simployf

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix_import

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove unused option

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Update dist_utils.py

dist.abort -> dist.destroy_process_group

Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r2.3.0 Pick this label for auto-cherrypicking into v2.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants