fix: adjust temperature scaling logic based on engine version #660

jubick1337 · 2025-07-12T20:01:01Z

What does this PR do ?

This PR fixes a training instability bug by conditionally applying temperature scaling to logits based on the active vLLM engine version.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

This fix addresses an issue where logits were being scaled by temperature twice when using vLLM's V1 engine. The V1 engine returns raw, unscaled logits, while the V0 engine returned logits that were already scaled.
The fix checks the VLLM_USE_V1 environment variable to determine if the V1 engine is active and only applies manual temperature division if it is not, ensuring the scaling is performed exactly once.

parthchadha

LGTM, thanks for the fix! One of our unit tests should have failed when v0=>v1 switch happened but I guess our unit tests were not robust enough.

parthchadha · 2025-07-15T20:37:44Z

@jubick1337 looks like lint check are failing, can you please run pre-commit as suggested at https://github.com/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md.

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

jubick1337 · 2025-07-16T02:08:40Z

Hi @parthchadha,

Thanks for your help. I fixed the linting.

Now mypy check is failing on files I did not change. What should I do next?

terrykong · 2025-07-16T04:38:13Z

Hi @jubick1337 . Thanks for helping make NeMo RL better!

The mypy job is expected to fail since we haven't 100% fixed our static typing issues, so you are a.o.k.

I've enqueued your PR

Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>

…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Jialei Chen <jialeic@google.com>

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com>

…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Qidong Su <qidongs@nvidia.com>

jubick1337 force-pushed the mnovikov/vllm_temp_fix branch from 407dd3a to ff8eff7 Compare July 12, 2025 20:01

parthchadha previously approved these changes Jul 14, 2025

View reviewed changes

terrykong added the r0.3.0 Release r0.3.0 label Jul 14, 2025

parthchadha added this pull request to the merge queue Jul 15, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 15, 2025

parthchadha added this pull request to the merge queue Jul 15, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 15, 2025

fix: adjust temperature scaling logic based on engine version

c4a0414

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

jubick1337 dismissed parthchadha’s stale review via c4a0414 July 16, 2025 01:58

jubick1337 force-pushed the mnovikov/vllm_temp_fix branch from 96b0b7d to c4a0414 Compare July 16, 2025 01:58

Merge branch 'main' into mnovikov/vllm_temp_fix

08c4d24

terrykong enabled auto-merge July 16, 2025 04:38

terrykong approved these changes Jul 16, 2025

View reviewed changes

terrykong added this pull request to the merge queue Jul 16, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 16, 2025

parthchadha added this pull request to the merge queue Jul 16, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 16, 2025

terrykong added this pull request to the merge queue Jul 16, 2025

Merged via the queue into NVIDIA-NeMo:main with commit 08b4a60 Jul 16, 2025
13 of 14 checks passed

jubick1337 deleted the mnovikov/vllm_temp_fix branch July 16, 2025 12:39

ZhiyuLi-Nvidia pushed a commit that referenced this pull request Jul 21, 2025

fix: adjust temperature scaling logic based on engine version (#660)

7941826

Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>

jialei777 pushed a commit to jialei777/nemo-rl that referenced this pull request Jul 23, 2025

fix: adjust temperature scaling logic based on engine version (NVIDIA…

d0424cf

…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Jialei Chen <jialeic@google.com>

KiddoZhu pushed a commit that referenced this pull request Jul 28, 2025

fix: adjust temperature scaling logic based on engine version (#660)

b18e945

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

xxman-google pushed a commit to xxman-google/NeMo-RL that referenced this pull request Jul 30, 2025

fix: adjust temperature scaling logic based on engine version (NVIDIA…

a821d67

…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com>

FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025

fix: adjust temperature scaling logic based on engine version (NVIDIA…

47a3d11

…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com>

FannYYW pushed a commit to xxman-google/NeMo-RL that referenced this pull request Aug 5, 2025

fix: adjust temperature scaling logic based on engine version (NVIDIA…

9cd2354

…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com>

soodoshll pushed a commit to soodoshll/RL that referenced this pull request Aug 13, 2025

fix: adjust temperature scaling logic based on engine version (NVIDIA…

0a8b200

…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Qidong Su <qidongs@nvidia.com>

zhandaz mentioned this pull request Aug 18, 2025

fix: fix temperature-related issues #935

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: adjust temperature scaling logic based on engine version #660

fix: adjust temperature scaling logic based on engine version #660

Uh oh!

jubick1337 commented Jul 12, 2025

Uh oh!

parthchadha left a comment

Uh oh!

Uh oh!

Uh oh!

parthchadha commented Jul 15, 2025

Uh oh!

jubick1337 commented Jul 16, 2025

Uh oh!

terrykong commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix: adjust temperature scaling logic based on engine version #660

fix: adjust temperature scaling logic based on engine version #660

Uh oh!

Conversation

jubick1337 commented Jul 12, 2025

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

parthchadha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

parthchadha commented Jul 15, 2025

Uh oh!

jubick1337 commented Jul 16, 2025

Uh oh!

terrykong commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!