Xiangxin Zhou*, Zichen Liu*, Anya Sims*, Haonan Wang, Tianyu Pang
Chongxuan Li, Liang Wang, Min Lin, Chao Du†
*Equal contribution, †Correspondence
📚 [Paper] | 🤗 [Checkpoints]
The code has been tested in the following environment:
conda create -n VeriFree python=3.10 -y
conda activate VeriFree
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
pip install oat-llm==0.1.3
# PATH_TO_YOUR_USER_DIRECTORY in the following command should be modified
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{PATH_TO_YOUR_USER_DIRECTORY}/.conda/envs/VeriFree/lib/
The following command is an example for fine-tuning Qwen3 base models by VeriFree policy optimization:
bash run.sh
If you find our work useful for your research, please consider citing:
@article{zhou2025verifree,
title={Reinforcing General Reasoning without Verifiers},
author={Zhou, Xiangxin and Liu, Zichen and Sims, Anya and Wang, Haonan and Pang, Tianyu and Li, Chongxuan and Wang, Liang and Lin, Min and Du, Chao},
journal={arXiv preprint arXiv:2505.21493},
year={2025}
}