Skip to content

Conversation

joyang-nv
Copy link
Member

@joyang-nv joyang-nv commented Jul 16, 2025

What does this PR do ?

Enable CP for dtensor during get_logprobs. Such we could align with mcore path.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

image image

@joyang-nv joyang-nv added the CI:L1 Run doctests, unit tests, and functional tests label Jul 16, 2025
@joyang-nv joyang-nv removed the CI:L1 Run doctests, unit tests, and functional tests label Jul 16, 2025
@joyang-nv joyang-nv force-pushed the joyang/get_logprobs_with_cp branch from 96480bc to 03d8333 Compare July 16, 2025 10:14
@joyang-nv joyang-nv marked this pull request as ready for review July 16, 2025 10:14
@joyang-nv joyang-nv added the CI:L1 Run doctests, unit tests, and functional tests label Jul 16, 2025
@joyang-nv joyang-nv force-pushed the joyang/get_logprobs_with_cp branch from 03d8333 to 6835949 Compare July 17, 2025 15:38
Signed-off-by: Jonas yang <joyang@nvidia.com>
@joyang-nv joyang-nv force-pushed the joyang/get_logprobs_with_cp branch from 6835949 to 7187c2d Compare July 17, 2025 15:51
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jul 17, 2025
@SahilJain314 SahilJain314 changed the base branch from main to sahilj/cp-rebase July 17, 2025 23:05
@@ -132,7 +132,7 @@ def dtensor_from_parallel_logits_to_logprobs(
"""Get log probabilities from TP+CP sharded vocab logits.

Args:
vocab_parallel_logits (DTensor): Logits distributed across tensor parallel workers,
vocab_parallel_logits (orch.Tensor): Logits distributed across tensor parallel workers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vocab_parallel_logits (orch.Tensor): Logits distributed across tensor parallel workers,
vocab_parallel_logits (torch.Tensor): Logits distributed across tensor parallel workers,

@SahilJain314 SahilJain314 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jul 18, 2025
@SahilJain314 SahilJain314 merged commit f508b13 into sahilj/cp-rebase Jul 18, 2025
15 of 16 checks passed
@SahilJain314 SahilJain314 deleted the joyang/get_logprobs_with_cp branch July 18, 2025 21:10
SahilJain314 pushed a commit that referenced this pull request Jul 21, 2025
Signed-off-by: Jonas yang <joyang@nvidia.com>
SahilJain314 pushed a commit that referenced this pull request Jul 21, 2025
Signed-off-by: Jonas yang <joyang@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants