-
-
Notifications
You must be signed in to change notification settings - Fork 10k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
Model Input Dumps
No response
🐛 Describe the bug
It seems to be that pixtral_hf accuracy has been affected since the last known good result from 0.6.4.post1.
Reference results on HF model card, we will look at `MMMU (CoT) ~= 51%. Evals ran using mistral-evals
vLLM 0.6.4.post1, server and eval:
> uv pip install vllm==0.6.4.post1
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000
> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
"explicit_prompt_relaxed_correctness": 0.5044444444444445,
"anywhere_in_answer_relaxed_correctness": 0.5044444444444445
}
================================================================================
vLLM 0.6.5, server and eval:
> uv pip install vllm==0.6.5
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000
> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
"explicit_prompt_relaxed_correctness": 0.0011111111111111111,
"anywhere_in_answer_relaxed_correctness": 0.3466666666666667
}
================================================================================
vLLM using #11741, server and eval:
> uv pip install vllm==0.6.5
> vllm serve nm-testing/pixtral-12b-FP8-dynamic --max-num-seqs 30 --max-model-len 30000 --limit-mm-per-prompt image=5 --port 9000
> python -m eval.run eval_vllm --model_name nm-testing/pixtral-12b-FP8-dynamic --url http://0.0.0.0:9000 --output_dir output/ --eval_name "mmmu"
...
================================================================================
Metrics:
{
"explicit_prompt_relaxed_correctness": 0.0011111111111111111,
"anywhere_in_answer_relaxed_correctness": 0.3466666666666667
}
================================================================================
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working