ROLL: Reinforcement Learning Optimization for Large-Scale Learning

🚀 An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models 🚀

ROLL is an efficient and user-friendly RL library designed for Large Language Models (LLMs) utilizing Large Scale GPU resources. It significantly enhances LLM performance in key areas such as human preference alignment, complex reasoning, and multi-turn agentic interaction scenarios.

Leveraging a multi-role distributed architecture with Ray for flexible resource allocation and heterogeneous task scheduling, ROLL integrates cutting-edge technologies like Megatron-Core, SGLang and vLLM to accelerate model training and inference.

📢 News

📣 Updates
[08/11/2025] 🎉 Our Paper released, see Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning.
[08/13/2025] 🎉 ROLL supports AMD GPUs with out-of-box image docker and Dockerfile and specific yamls under `examples/` directory. Please refer to Installation.
[08/10/2025] 🎉 Agentic RL supports stepwise learning, like GiGPO; Distill supports VLM. Explore the new capabilities!
[07/31/2025] 🎉 Refactor agentic rl design. Support agentic rl async training. Explore the new capabilities!
[07/31/2025] 🎉 Support DistillPipeline/DpoPipeline. Support lora. Support GSPO
[06/25/2025] 🎉 Support thread env for env scaling and support qwen2.5 VL agentic pipeline.
[06/13/2025] 🎉 Support Qwen2.5 VL rlvr pipeline and upgrade mcore to 0.12 version.
[06/09/2025] 🎉 ROLL tech report is now available! Access the report here.
[05/30/2025] 🎉 Training RLVR and Agentic RL with ROLL is now available! Explore the new capabilities.

🚀 Get Started

Documents

Quick Start

Quick Start based on alicloud
Installation
Config guide

Step By Step

RLVR Pipeline
Agentic RL Pipeline

✨ Key Features

Multi-task RL Training (RLVR): Covers mathematics, coding, general reasoning, open-ended Q&A, instruction following, etc.
- Flexible domain_batch_size distribution control.
- Sample-level asynchronous parallel Rollout, asynchronous reward calculation, and dynamic sampling.
- Asynchronous training under implementation.
Agentic RL: Multi-turn interaction capabilities for games, multi-turn dialogues, tool use, etc.
- Environment-level asynchronous parallel rollout.
- Supports asynchronous training.
- Multi-turn interaction rollout supports local debugging, improving multi-turn interaction business development efficiency.
- Supports TrajectoryWise (StartPO) and StepWise (GiGPO) training paradigms.
Algorithm-Friendly: Provides flexible and rich RL strategy configurations by default.
- Over 20 rich reinforcement learning strategy options, such as reward normalization, reward clipping, various advantage estimation methods, etc.
- Out-of-the-box support for reinforcement learning algorithms, such as PPO, GRPO, Reinforce++, TOPR, RAFT++, GSPO, etc.
Rich Training and Inference Engine: Ray-based multi-role distributed architecture; Strategy abstraction unifies various backends, enabling easy operation from single machines to thousands-of-GPU clusters.
- Inference/Generation supports vLLM, SGLang.
- Training supports DeepSpeed (ZeRO), Megatron-LM 5D parallelism (mcore-adapter, dp/tp/pp/cp/ep), FSDP under implementation.
- Extreme offload/reload capabilities.
- Supports LoRA training.
- Supports FP8 rollout (FP8 inference for LLM as judge, FP8 rollout with BF16 training under development).
AutoDeviceMapping: Supports custom device mapping for different roles, flexibly managing colocated and disaggregated deployments.
Observability: Integrated with SwanLab / WandB / TensorBoard, tracking of performance for each domain and reward type.
Rich Post-training Technical Support:
- Agentic RL LLM & VLM
- RLVR LLM & VLM
- Distill Pipeline LLM & VLM
- DPO Pipeline
- SFT Pipeline under development

🔮 Upcoming Features

We are continuously working to expand ROLL's capabilities:

⏱️ Async RLVR pipeline: For even more efficient and streamlined asynchronous operations.
⚙️ FSDP2: Integrating the latest Fully Sharded Data Parallel techniques.
🔍 Support DeepseekV3: Adding compatibility for the newest Deepseek models.

🏆 Notable work based on ROLL

RecGPT: a next-generation, LLM-driven framework that places user intent at the core of recommender systems, fostering a more sustainable and mutually beneficial ecosystem.

🙏 Citation and Acknowledgement

ROLL is inspired by the design of OpenRLHF, VeRL, Nemo-Aligner, and RAGEN. The project is developed by Alibaba TAOBAO & TMALL Group and Alibaba Group. The code is distributed under the Apache License (Version 2.0). This product contains various third-party components under other open-source licenses. See the NOTICE file for more information.

The following repositories have been used in ROLL, either in their close-to-original form or as an inspiration:

If you use ROLL in your research or project, please consider citing us:

@article{wang2025reinforcement,
  title={Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library},
  author={Wang, Weixun and Xiong, Shaopan and Chen, Gengru and Gao, Wei and Guo, Sheng and He, Yancheng and Huang, Ju and Liu, Jiaheng and Li, Zhendong and Li, Xiaoyang and others},
  journal={arXiv preprint arXiv:2506.06122},
  year={2025}
}

🤝 About [ROLL Team]

ROLL is a project jointly developed by Taotian Future Life Lab and Aicheng Technology, with a strong emphasis on pioneering the future of Reinforcement Learning (RL). Our mission is to explore and shape innovative forms of future living powered by advanced RL technologies. If you are passionate about the future of RL and want to be part of its evolution, we warmly welcome you to join us! Learn more about the ROLL Team through our official channels below👇

We are HIRING!

Post Training Infra 研发工程师 JD link
大模型训练专家：
- （社招）JD link
- （校招）JD link
Infra 研究型实习生 JD link

We welcome contributions from the community! 🤝

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
.github/workflows		.github/workflows
assets		assets
data		data
docker		docker
docs		docs
docs_roll		docs_roll
examples		examples
mcore_adapter		mcore_adapter
roll		roll
scripts		scripts
tests		tests
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements_common.txt		requirements_common.txt
requirements_em_local_debug.txt		requirements_em_local_debug.txt
requirements_torch251_sglang.txt		requirements_torch251_sglang.txt
requirements_torch251_vllm.txt		requirements_torch251_vllm.txt
requirements_torch260_sglang.txt		requirements_torch260_sglang.txt
requirements_torch260_vllm.txt		requirements_torch260_vllm.txt
requirements_vision.txt		requirements_vision.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ROLL: Reinforcement Learning Optimization for Large-Scale Learning

🚀 An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models 🚀

📢 News

🚀 Get Started

Quick Start

Step By Step

✨ Key Features

🔮 Upcoming Features

🏆 Notable work based on ROLL

🙏 Citation and Acknowledgement

🤝 About [ROLL Team]

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 31

Uh oh!

Languages

License

alibaba/ROLL

Folders and files

Latest commit

History

Repository files navigation

ROLL: Reinforcement Learning Optimization for Large-Scale Learning

🚀 An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models 🚀

📢 News

🚀 Get Started

Quick Start

Step By Step

✨ Key Features

🔮 Upcoming Features

🏆 Notable work based on ROLL

🙏 Citation and Acknowledgement

🤝 About [ROLL Team]

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 31

Uh oh!

Languages

Packages