-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Closed
Labels
Multi-modalmulti-modal language modelmulti-modal language modelgood first issueGood for newcomersGood for newcomers
Description
This issue keeps track of all vlm models accuracy in MMMU benchmark. Keep updating
python benchmark/mmmu/bench_sglang.py
python benchmark/mmmu/bench_hf.py --model-path model
sglang | hf | |
---|---|---|
Qwen2-VL-7B-Instruct | 0.485 | 0.255 |
Qwen2.5-VL-7B-Instruct | 0.477 | 0.242 |
MiniCPM-V-2_6 | 0.426 | |
MiniCPM-O-2_6 | 0.481 | 0.49 |
Deepseek-vl2 | 0.496 | 0.499 |
Deepseek-vl2-small | 0.464 | 0.453 |
Deepseek-vl2-tiny | 0.382 | 0.369 |
Deepseek-Janus-Pro-7B | ||
Llava + Llama | ||
Llava + qwen | ||
Llava + Mistral | ||
Mlama | ||
Gemma-3-it-4B | 0.409 | 0.403 |
InternVL2.5-38B | 0.61 |
Metadata
Metadata
Assignees
Labels
Multi-modalmulti-modal language modelmulti-modal language modelgood first issueGood for newcomersGood for newcomers