-
Notifications
You must be signed in to change notification settings - Fork 109
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Hello everyone!
Thank you for your interest in ROLL.
ROLL has recently introduced a host of new features. Below is a summary of the recent updates. We will continue to iterate and update ROLL, and we welcome you to join the ROLL community.
🚀 New Features
-
Agentic RL
- Refactored and optimized the Agentic RL design to provide a more powerful and flexible framework.
- Introduced multi-turn interactive local development and debugging capabilities for Agentic RL, significantly boosting development and debugging efficiency. Example:
tests/agentic/env_manager/test_traj_env_manager.py
- Added Agentic RL async training to improve training efficiency. Example:
examples/qwen2.5-0.5B-agentic/agent_val_frozen_lake_async.yaml
-
New Training Capabilities
- Supported Group Sequence Policy (GSPO) with
importance_sampling: Literal["token", "seq"]
. - Introduced Distill Pipeline, providing knowledge distillation capabilities. Path:
roll/pipeline/distill/distill_pipeline.py
- Added VLM Multi-domain RLVR Pipeline, enabling multi-domain joint training for multi-modal models. Path:
roll/pipeline/rlvr/rlvr_vlm_pipeline.py
- New DPO Pipeline. Path:
roll/pipeline/dpo/dpo_pipeline.py
- Supported LoRA training. Example:
examples/qwen2.5-7B-rlvr_megatron/rlvr_lora_zero3.yaml
- Added the latest math test datasets, GPQA-Diamond, and a new
MultipleChoiceBoxedRuleRewardWorker
.
- Supported Group Sequence Policy (GSPO) with
-
Other Enhancements
- Improved the functionality for restoring checkpoints from downloaded model paths.
- Added
CLAUD.md
documentation. - Fixed issues caused by Automap concurrency.
- Fixed directory issues when saving Critic checkpoints.
- Resolved
vllm_strategy
Qwen3 Dense FP8 compatibility issues.
wangjiamang, douph810975, ZhangShuui and Galleons2029
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request