Skip to content

Conversation

jubick1337
Copy link
Contributor

What does this PR do ?

This PR fixes a training instability bug by conditionally applying temperature scaling to logits based on the active vLLM engine version.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • This fix addresses an issue where logits were being scaled by temperature twice when using vLLM's V1 engine. The V1 engine returns raw, unscaled logits, while the V0 engine returned logits that were already scaled.

  • The fix checks the VLLM_USE_V1 environment variable to determine if the V1 engine is active and only applies manual temperature division if it is not, ensuring the scaling is performed exactly once.

@jubick1337 jubick1337 force-pushed the mnovikov/vllm_temp_fix branch from 407dd3a to ff8eff7 Compare July 12, 2025 20:01
parthchadha
parthchadha previously approved these changes Jul 14, 2025
Copy link
Contributor

@parthchadha parthchadha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix! One of our unit tests should have failed when v0=>v1 switch happened but I guess our unit tests were not robust enough.

@terrykong terrykong added the r0.3.0 Release r0.3.0 label Jul 14, 2025
@parthchadha parthchadha added this pull request to the merge queue Jul 15, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 15, 2025
@parthchadha parthchadha added this pull request to the merge queue Jul 15, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 15, 2025
@parthchadha
Copy link
Contributor

@jubick1337 looks like lint check are failing, can you please run pre-commit as suggested at https://github.com/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md.

Signed-off-by: jubick1337 <mattyson.so@gmail.com>
@jubick1337
Copy link
Contributor Author

Hi @parthchadha,

Thanks for your help. I fixed the linting.

Now mypy check is failing on files I did not change. What should I do next?

@terrykong
Copy link
Contributor

Hi @jubick1337 . Thanks for helping make NeMo RL better!

The mypy job is expected to fail since we haven't 100% fixed our static typing issues, so you are a.o.k.

I've enqueued your PR

@terrykong terrykong enabled auto-merge July 16, 2025 04:38
@terrykong terrykong added this pull request to the merge queue Jul 16, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 16, 2025
@parthchadha parthchadha added this pull request to the merge queue Jul 16, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 16, 2025
@terrykong terrykong added this pull request to the merge queue Jul 16, 2025
Merged via the queue into NVIDIA-NeMo:main with commit 08b4a60 Jul 16, 2025
13 of 14 checks passed
@jubick1337 jubick1337 deleted the mnovikov/vllm_temp_fix branch July 16, 2025 12:39
ZhiyuLi-Nvidia pushed a commit that referenced this pull request Jul 21, 2025
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
jialei777 pushed a commit to jialei777/nemo-rl that referenced this pull request Jul 23, 2025
…-NeMo#660)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>
Signed-off-by: Jialei Chen <jialeic@google.com>
KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jul 30, 2025
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025
soodoshll pushed a commit to soodoshll/RL that referenced this pull request Aug 13, 2025
…-NeMo#660)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>
Signed-off-by: Qidong Su <qidongs@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
r0.3.0 Release r0.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants