Skip to content

[Track] VLM accuracy in MMMU benchmark #4456

@yizhang2077

Description

@yizhang2077

This issue keeps track of all vlm models accuracy in MMMU benchmark. Keep updating

python benchmark/mmmu/bench_sglang.py
python benchmark/mmmu/bench_hf.py --model-path model
sglang hf
Qwen2-VL-7B-Instruct 0.485 0.255
Qwen2.5-VL-7B-Instruct 0.477 0.242
MiniCPM-V-2_6 0.426
MiniCPM-O-2_6 0.481 0.49
Deepseek-vl2 0.496 0.499
Deepseek-vl2-small 0.464 0.453
Deepseek-vl2-tiny 0.382 0.369
Deepseek-Janus-Pro-7B
Llava + Llama
Llava + qwen
Llava + Mistral
Mlama
Gemma-3-it-4B 0.409 0.403
InternVL2.5-38B 0.61

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions