[Track] VLM accuracy in MMMU benchmark

This issue keeps track of all vlm models accuracy in MMMU benchmark. Keep updating

``` python
python benchmark/mmmu/bench_sglang.py
python benchmark/mmmu/bench_hf.py --model-path model

```

| | sglang | hf |
|--|--|--|
| Qwen2-VL-7B-Instruct |  0.485 | 0.255 |
| Qwen2.5-VL-7B-Instruct | 0.477 | 0.242 |
| MiniCPM-V-2_6 |  0.426 |  |
| MiniCPM-O-2_6 | 0.481| 0.49 |
| Deepseek-vl2 | 0.496 | 0.499|
|Deepseek-vl2-small | 0.464 | 0.453|
|Deepseek-vl2-tiny | 0.382 | 0.369|
| Deepseek-Janus-Pro-7B| | |
| Llava + Llama| | |
| Llava + qwen| | |
| Llava + Mistral| | |
| Mlama | | |
| Gemma-3-it-4B| 0.409 | 0.403 |
| InternVL2.5-38B | 0.61 | |



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Track] VLM accuracy in MMMU benchmark #4456

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	sglang	hf
Qwen2-VL-7B-Instruct	0.485	0.255
Qwen2.5-VL-7B-Instruct	0.477	0.242
MiniCPM-V-2_6	0.426
MiniCPM-O-2_6	0.481	0.49
Deepseek-vl2	0.496	0.499
Deepseek-vl2-small	0.464	0.453
Deepseek-vl2-tiny	0.382	0.369
Deepseek-Janus-Pro-7B
Llava + Llama
Llava + qwen
Llava + Mistral
Mlama
Gemma-3-it-4B	0.409	0.403
InternVL2.5-38B	0.61

[Track] VLM accuracy in MMMU benchmark #4456

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions