Multi-turn rollout & agentic RL Status & Roadmap

## Current Status Tracker

training setting support:
| feature | status | issue | pr |
| --- | --- | --- | --- |
| fsdp | done | | |
| fsdp2 | done (Verify by Yuzhen Zhou @ SGLang & AMD) | | https://github.com/volcengine/verl/pull/1650 |
| megatron | done (Dev by Xiang Long @ SGLang & ModelBest, Ziyuan Gao @ Bytedance) | | https://github.com/volcengine/verl/pull/1602 |
| fp8 | need to support | | |


rollout feature support:
| feature | status | issue | pr |
| --- | --- | --- | --- |
| request-level async rollout | done | | |
| tool interaction | done | | |
| VLM | geo3k in review (Nan Jiang @ Amazon & Congkai Xie @ Reallm Labs) | #137 | https://github.com/volcengine/verl/pull/2014 |
| multi-node | done by (Shengui Li & Jin Pan @ SGLang) | | |
| tool rate limit | ETA: May-mid (volcengine team is working on it) | | |
| exact colocated rollout | ETA: May (Junrong Lin @ SGLang & Qwen is working on it) | | |
| server-based rollout | developing (Haiquan Chen @ Bytedance & Xibin Wu is working on it) | | https://github.com/volcengine/verl/pull/1698 https://github.com/volcengine/verl/pull/1769 https://github.com/volcengine/verl/pull/1831 |
| partial rollout | ETA: unknown (Yuzhen Zhou @ SGLang & AMD is working on it) | | |

tool support:
| feature | status | issue | pr |
| --- | --- | --- | --- |
| calc_gsm8k_reward | done | | |
| sandboxfusion | testing (Thanks to Xiaocheng Wang @ Bytedance) | | https://github.com/volcengine/verl/pull/1525 |
| openhands like | pending | | |
| android world like | ETA: unknown (Congkai Xie @ RealLM Labs is working on it) | | |
| bowser-use like | ETA: unknown (Bai is working on it) | | |
| search | done (Thanks to Ling Chang @ USTC & Baidu (Author), Bowen Jin @ UIUC (Advisor)) | | https://github.com/volcengine/verl/pull/1682 |

algorithm support
| feature | status | issue | pr |
| --- | --- | --- | --- |
| GRPO | done |  |  |
| PPO | pending (Amazon AGI Lab is investigating) |  |  | 
| Reinforce++ | pending |  |  |
| other (welcome to mention in this thread) | TBD |  |  |

## Road Map

- [x] Update 1 #113 
  - [x] Add request-level async rollout with tool interaction
  - [x] Support FSDP multi-turn and train correctly, wandb log: https://wandb.ai/swordfaith/gsm8k_async_rl/runs/ta7jhvgq?nw=nwuserswordfaith
- [ ] Update 2 (ETA June early) #132 
  - [ ] Add real world tools support (at least Search & Code Interpreter)
    - [x] search & search-r1 reproduce https://github.com/volcengine/verl/pull/1682
    - [ ] code sandbox, sandbox fusion supported https://github.com/volcengine/verl/pull/1525, working on retool reproduction (sft: https://github.com/volcengine/verl/pull/1889/ )
  - [ ] Add Qwen3 training example
  - [x] Add Megatron support
  - [x] Add multi-node support [trouble shooting doc](https://occw56ckam.feishu.cn/docx/IauNdDv9UoIiUGxTPzvcP74snsc?from=from_copylink)
  - [x] Add FSDP2 support
  - [ ] Add VLM support https://github.com/volcengine/verl/pull/2014
  
- [ ] Update 3 (In discussion)
  - [ ] Add Ray Agentic Trainer, The current async request-level approach avoids batch tool calls and generation barriers. A better approach is to use an agentic trainer: fetch → rollout → reward calculation → filter micro-batch in a seamless async loop.
  - [ ] Add partial rollout and replay buffer in agentic trainer with partial rollout setting
  - [ ] Add more tools and examples
  - [ ] Add GenRM support
- [ ] Update 4
  - [ ] Introduce user interaction simulation https://github.com/volcengine/verl/pull/1630
  - [ ] Introduce co-training with judge agent
  - [ ] Introduce MCP tools or other wrap solution for current implemented tools

## Refactor To-dos

- [ ] Combine the current sglang and sglang_async rollouts seamlessly so that users remain unaware of the changes. (ETA: 5.19，dev phase done, waiting to merge prs) https://github.com/volcengine/verl/pull/1602 https://github.com/volcengine/verl/pull/1717
- [ ] An asynchronous tool from the SPMD resource pool designed to isolate the global Ray actor, enhancing the management of the global rate limit.
- [ ] Keep using chat template in async rollout req, to avoid mismatch risk. Qwen chat template community fix before refactor https://github.com/volcengine/verl/pull/1593 https://github.com/volcengine/verl/pull/1668
- [ ] Decoupling max_response_length, max_model_len, max_new_tokens
- [ ] Support `parse_args` api in tool, to custom tool parse parameters logic.
- [ ] Decoupling loss_mask and response_mask, and redesign this part of logic.

## Trouble Shooting Tacker
| issue | status | reason | pr | owner
| --- | --- | --- | --- | --- |
| sglang offloading not work https://github.com/volcengine/verl/issues/1545 | pending to verify |  |  |  |
| Recent reprod fail https://github.com/volcengine/verl/pull/1037#issuecomment-2890493577 | pending to verify |  |  |  |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-turn rollout & agentic RL Status & Roadmap #131

Current Status Tracker

Road Map

Refactor To-dos

Trouble Shooting Tacker

Sub-issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feature	status	pr
fsdp	done
fsdp2	done (Verify by Yuzhen Zhou @ SGLang & AMD)	volcengine/verl#1650
megatron	done (Dev by Xiang Long @ SGLang & ModelBest, Ziyuan Gao @ Bytedance)	volcengine/verl#1602
fp8	need to support

feature	status	issue	pr
request-level async rollout	done
tool interaction	done
VLM	geo3k in review (Nan Jiang @ Amazon & Congkai Xie @ Reallm Labs)	#137	volcengine/verl#2014
multi-node	done by (Shengui Li & Jin Pan @ SGLang)
tool rate limit	ETA: May-mid (volcengine team is working on it)
exact colocated rollout	ETA: May (Junrong Lin @ SGLang & Qwen is working on it)
server-based rollout	developing (Haiquan Chen @ Bytedance & Xibin Wu is working on it)		volcengine/verl#1698 volcengine/verl#1769 volcengine/verl#1831
partial rollout	ETA: unknown (Yuzhen Zhou @ SGLang & AMD is working on it)

feature	status	pr
calc_gsm8k_reward	done
sandboxfusion	testing (Thanks to Xiaocheng Wang @ Bytedance)	volcengine/verl#1525
openhands like	pending
android world like	ETA: unknown (Congkai Xie @ RealLM Labs is working on it)
bowser-use like	ETA: unknown (Bai is working on it)
search	done (Thanks to Ling Chang @ USTC & Baidu (Author), Bowen Jin @ UIUC (Advisor))	volcengine/verl#1682

feature	status	issue	pr
GRPO	done
PPO	pending (Amazon AGI Lab is investigating)
Reinforce++	pending
other (welcome to mention in this thread)	TBD

issue	status	reason	pr	owner
sglang offloading not work volcengine/verl#1545	pending to verify
Recent reprod fail volcengine/verl#1037 (comment)	pending to verify

Multi-turn rollout & agentic RL Status & Roadmap #131

Description

Current Status Tracker

Road Map

Refactor To-dos

Trouble Shooting Tacker

Sub-issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions