-
Notifications
You must be signed in to change notification settings - Fork 122
chore: upgrade vllm to v0.10.0 #766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since v0.10.0 is out, are we able to jump straight to that one?
The release notes from 0.10.0 look good. Particularly this:
We need to be careful to select the post-processing logprobs ones for our case. The RPC methods should be useful as well. |
Blocking merging #543 until vllm is updated to 0.10.0 here and merged |
0ac0567
to
fe57f35
Compare
fe57f35
to
a16f198
Compare
w/ |
@yuki-666 lets force init to always use raw_logprobs in case the default in vllm is changed in future releases. |
raw logprobs are fine for now (same as v1). But are actually incorrect for top-k, top-p sampling (even though it appears as though it would work correctly). Both the logprobs from vllm and the training framework are incorrect since they haven't been processed, which introduces error. (I believe the claim from #773 that v1 is fine is incorrect) We have 2 options for how to mitigate this:
I'm leaning strongly towards the latter as it leaves less room for us to make mistakes and not notice. We should track this in #69. |
@yuki-666 I see in some of the plots that the new vllm version is much more 'spiky' in logprob error. This is fine (spikes are occasional) but would you mind also adding plots with the logprob error clamped to a max value of 1.2? I'd like to confirm that the typical value has not increased at all. |
Also want to just leave a comment that 0.10.0 version may fix this issue I've observed in gemma runs |
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
08d2789
to
f2887a0
Compare
Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Qidong Su <qidongs@nvidia.com>
What does this PR do ?
Upgrade vllm from v0.9.0 to v0.10.0 .
This could help:
Test Result
Llama3 (FSDP2 and mcore backend)
configs of FSDP2 and mcore have some differences.
gemma3 1b and 27b (FSDP2 backend)
DSV3 (mcore backend)
VLM (FSDP2 backend)
llava-hf/llava-1.5-7b-hf
,HuggingFaceTB/SmolVLM2-2.2B-Instruct
,llava-hf/llava-1.5-7b-hf
work fine.Qwen/Qwen2.5-VL-3B-Instruct
,google/gemma-3-4b-it
's reward is fine, but have unexpectedtoken_mult_prob_error
.This won't block upgrade vllm to v0.10.0, since v0.9.2 has the similar logprob behavior, traced here #793.
non-colocated on single-node (FSDP2 backend)
Test
Qwen2.5-1.5B
using 4 GPUs for train and 4 GPUs for inference.non-colocated w/ async vllm on multi-nodes (FSDP2 backend)
Test
Llama-3.1-8B-Instruct
using 2 nodes for train and 1 node for inference.Issues
Closes #722.