Skip to content

sail-sg/VeriFree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VeriFree: Reinforcing General Reasoning without Verifiers

Xiangxin Zhou*, Zichen Liu*, Anya Sims*, Haonan Wang, Tianyu Pang

Chongxuan Li, Liang Wang, Min Lin, Chao Du†

*Equal contribution, †Correspondence

📚 [Paper] | 🤗 [Checkpoints]

Overview

Usage

Dependency

The code has been tested in the following environment:

conda create -n VeriFree python=3.10 -y
conda activate VeriFree

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
pip install oat-llm==0.1.3

# PATH_TO_YOUR_USER_DIRECTORY in the following command should be modified 
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{PATH_TO_YOUR_USER_DIRECTORY}/.conda/envs/VeriFree/lib/

Training

The following command is an example for fine-tuning Qwen3 base models by VeriFree policy optimization:

bash run.sh

Acknowledgements

  • We use oat as the training framework.
  • Our model is trained on top of Qwen3.

Citation

If you find our work useful for your research, please consider citing:

@article{zhou2025verifree,
  title={Reinforcing General Reasoning without Verifiers},
  author={Zhou, Xiangxin and Liu, Zichen and Sims, Anya and Wang, Haonan and Pang, Tianyu and Li, Chongxuan and Wang, Liang and Lin, Min and Du, Chao},
  journal={arXiv preprint arXiv:2505.21493},
  year={2025}
}

About

Reinforcing General Reasoning without Verifiers

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •