This repository open-sources the training code of InternVL3.5 during the online RL stage, which is built upon the PR in verl. Compared to the original PR, we have corrected the dialogue template for InternVL and updated a monkey patch for InternVL to enable sequence-parallel. For training details, please refer to the provided scripts.
We use MMPR-Tiny as the training dataset and initialize the model with InternVL3.5 trained after MPO. We also provide a packaged conda environment for easy reproduction.
For the original README of verl, please refer to this file.
Based on this codebase, the InternVL3.5 series across all model scales achieve a significant improvement in reasoning performance.
We open-source our training data (i.e., MMPR-Tiny) on HuggingFace. To reproduce our training results, you need to download this dataset and move it into this folder. Additionally, considering that verl requires a validation dataset to be loaded, please prepare this data using this script.
├── MMPR-Tiny
│ ├── images
│ └── mmpr_tiny.parquet
├── verl_data
│ └── geo3k
│ └── test.parquet
├── verl
└── README.md
We also provide a packaged conda environment for easy reproduction. After preparing the dataset and conda environment, you can launch the training using the commond as follows:
sh shell/internvl3_5_8b.sh
We mainly use VLMEvalkit to evaluate our models. Please refer to their documentation and our model configs for more details. As an example, you can set the config as follows:
"InternVL3_5-8B-Thinking": partial(
InternVLChat,
model_path="/path/to/your/moel",
version="V2.0",
cot_prompt_version="r1",
max_new_tokens=32768,
do_sample=True,
use_lmdeploy=True,
),
If you find this project useful in your research, please consider citing:
@article{wang2025internvl3_5,
title={InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency},
author={Wang, Weiyun and Gao, Zhangwei and Gu, Lixin and Pu, Hengjun and Cui, Long and Wei, Xingguang and Liu, Zhaoyang and Jing, Linglin and Ye, Shenglong and Shao, Jie and others},
journal={arXiv preprint arXiv:2508.18265},
year={2025}
}
@article{wang2024mpo,
title={Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization},
author={Wang, Weiyun and Chen, Zhe and Wang, Wenhai and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Zhu, Jinguo and Zhu, Xizhou and Lu, Lewei and Qiao, Yu and Dai, Jifeng},
journal={arXiv preprint arXiv:2411.10442},
year={2024}
}