Skip to content

RUCAIBox/R1-Searcher-plus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

license license Hugging Face
If you like our project, please give us a star ⭐ on GitHub for the latest update.

✨ News

  • [22 May 2025] ⚡️⚡️ R1-Searcher++:We propose R1-Searcher++, a framework for training LLMs to adaptively use internal and external knowledge. It uses a two-stage strategy: an initial SFT Cold-start phase for basic format learning, and an RL phase for Dynamic Knowledge Acquisition. In the RL phase, we introduce a reward mechanism for the utilization of internal knowledge and integrate a memorization mechanism to continuously assimilate the retrieved information, thereby enriching the model's internal knowledge. The paper can be found here: arxiv.org/abs/2505.17005
  • [22 May 2025] ⚡️⚡️ SimpleDeepSearcher-paper:We release the paper of the SimpleDeepSearcher, which also explores the impact of using a distilled model as the backbone for continued reinforcement learning training, as well as the effects of incorporating long cot math reasoning data during the training process. Additionally, the paper includes comprehensive experiments. The paper can be found here: arxiv.org/abs/2505.16834
  • [16 Apr 2025] ⚡️⚡️ SimpleDeepSearcher:We propose SimpleDeepSearcher, a framework designed to stimulate autonomous retrieval during complex reasoning via knowledge distillation and self-distillation. The goal is to achieve efficient and effective training using only a small amount of data.
  • [8 Mar 2025] ⚡️⚡️ R1-SearcherWe propose R1-searcher, utilizing a two-stage outcome-supervision reinforcement learning approach to enable the model to learn to invoke web search during the reasoning process: first allowing the model to learn how to invoke web search, and then teaching it how to effectively use that search engine. This method does not require any instruction fine-tuning for cold start, and at the same time, it is compatible with existing Base LLMs or Chat LLMs.

💡 Overview

✨ Method

benchmark_picture

📄 Evaluation

benchmark_picture

💫 Case Study

benchmark_picture

🙌 Compare to R1-Searcher

benchmark_picture

🏃 Quick Start

Coming soon... (Almost the same as R1-Searcher's.)

📄 Citation

Please kindly cite our report if they are helpful for your research.

@article{song2025r1,
  title={R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning},
  author={Song, Huatong and Jiang, Jinhao and Tian, Wenqing and Chen, Zhipeng and Wu, Yuhuan and Zhao, Jiahao and Min, Yingqian and Zhao, Wayne Xin and Fang, Lei and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2505.17005},
  year={2025}
}

📄 License

This project is released under the MIT License.

📞 Contact

For any questions or feedback, please reach out to us at songhuatong123@ruc.edu.cn.

About

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages