-
Notifications
You must be signed in to change notification settings - Fork 117
fix: adjust temperature scaling logic based on engine version #660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: adjust temperature scaling logic based on engine version #660
Conversation
407dd3a
to
ff8eff7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the fix! One of our unit tests should have failed when v0=>v1 switch happened but I guess our unit tests were not robust enough.
@jubick1337 looks like lint check are failing, can you please run pre-commit as suggested at https://github.com/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md. |
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
96b0b7d
to
c4a0414
Compare
Hi @parthchadha, Thanks for your help. I fixed the linting. Now mypy check is failing on files I did not change. What should I do next? |
Hi @jubick1337 . Thanks for helping make NeMo RL better! The mypy job is expected to fail since we haven't 100% fixed our static typing issues, so you are a.o.k. I've enqueued your PR |
Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Jialei Chen <jialeic@google.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com>
…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com>
…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com>
…-NeMo#660) Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Qidong Su <qidongs@nvidia.com>
What does this PR do ?
This PR fixes a training instability bug by conditionally applying temperature scaling to logits based on the active vLLM engine version.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
Additional Information
This fix addresses an issue where logits were being scaled by temperature twice when using vLLM's V1 engine. The V1 engine returns raw, unscaled logits, while the V0 engine returned logits that were already scaled.
The fix checks the VLLM_USE_V1 environment variable to determine if the V1 engine is active and only applies manual temperature division if it is not, ensuring the scaling is performed exactly once.