-
Notifications
You must be signed in to change notification settings - Fork 210
Open
0 / 10 of 1 issue completedLabels
Description
Current Status Tracker
training setting support:
feature | status | issue | pr |
---|---|---|---|
fsdp | done | ||
fsdp2 | done (Verify by Yuzhen Zhou @ SGLang & AMD) | volcengine/verl#1650 | |
megatron | done (Dev by Xiang Long @ SGLang & ModelBest, Ziyuan Gao @ Bytedance) | volcengine/verl#1602 | |
fp8 | need to support |
rollout feature support:
feature | status | issue | pr |
---|---|---|---|
request-level async rollout | done | ||
tool interaction | done | ||
VLM | geo3k in review (Nan Jiang @ Amazon & Congkai Xie @ Reallm Labs) | #137 | volcengine/verl#2014 |
multi-node | done by (Shengui Li & Jin Pan @ SGLang) | ||
tool rate limit | ETA: May-mid (volcengine team is working on it) | ||
exact colocated rollout | ETA: May (Junrong Lin @ SGLang & Qwen is working on it) | ||
server-based rollout | developing (Haiquan Chen @ Bytedance & Xibin Wu is working on it) | volcengine/verl#1698 volcengine/verl#1769 volcengine/verl#1831 | |
partial rollout | ETA: unknown (Yuzhen Zhou @ SGLang & AMD is working on it) |
tool support:
feature | status | issue | pr |
---|---|---|---|
calc_gsm8k_reward | done | ||
sandboxfusion | testing (Thanks to Xiaocheng Wang @ Bytedance) | volcengine/verl#1525 | |
openhands like | pending | ||
android world like | ETA: unknown (Congkai Xie @ RealLM Labs is working on it) | ||
bowser-use like | ETA: unknown (Bai is working on it) | ||
search | done (Thanks to Ling Chang @ USTC & Baidu (Author), Bowen Jin @ UIUC (Advisor)) | volcengine/verl#1682 |
algorithm support
feature | status | issue | pr |
---|---|---|---|
GRPO | done | ||
PPO | pending (Amazon AGI Lab is investigating) | ||
Reinforce++ | pending | ||
other (welcome to mention in this thread) | TBD |
Road Map
-
Update 1 Multi-turn rollout Update #1 #113
- Add request-level async rollout with tool interaction
- Support FSDP multi-turn and train correctly, wandb log: https://wandb.ai/swordfaith/gsm8k_async_rl/runs/ta7jhvgq?nw=nwuserswordfaith
-
Update 2 (ETA June early) Multi-turn rollout Update #2 #132
- Add real world tools support (at least Search & Code Interpreter)
- search & search-r1 reproduce [sglang] Feat: Search Tool Invocation in Multi-Turn RL Training volcengine/verl#1682
- code sandbox, sandbox fusion supported feat: sandbox fusion for multi-turn volcengine/verl#1525, working on retool reproduction (sft: [training_utils] Add qwen3 multi-turn sft support volcengine/verl#1889 )
- Add Qwen3 training example
- Add Megatron support
- Add multi-node support trouble shooting doc
- Add FSDP2 support
- Add VLM support [sglang] feat: add multimodal input to multiturn async rollout volcengine/verl#2014
- Add real world tools support (at least Search & Code Interpreter)
-
Update 3 (In discussion)
- Add Ray Agentic Trainer, The current async request-level approach avoids batch tool calls and generation barriers. A better approach is to use an agentic trainer: fetch → rollout → reward calculation → filter micro-batch in a seamless async loop.
- Add partial rollout and replay buffer in agentic trainer with partial rollout setting
- Add more tools and examples
- Add GenRM support
-
Update 4
- Introduce user interaction simulation [sglang] feat: Support async multi-turn rollout with simulation feedback in sglang volcengine/verl#1630
- Introduce co-training with judge agent
- Introduce MCP tools or other wrap solution for current implemented tools
Refactor To-dos
- Combine the current sglang and sglang_async rollouts seamlessly so that users remain unaware of the changes. (ETA: 5.19,dev phase done, waiting to merge prs) [sglang] Fix megatron support in sglang and add sglang_async support & CI tasks volcengine/verl#1602 [sglang] refactor: Unify async rollout under SGLangRollout, and support sglang==0.4.6.post5 volcengine/verl#1717
- An asynchronous tool from the SPMD resource pool designed to isolate the global Ray actor, enhancing the management of the global rate limit.
- Keep using chat template in async rollout req, to avoid mismatch risk. Qwen chat template community fix before refactor Support multi-turn rollout with Qwen chat template volcengine/verl#1593 [sglang] feat: Efficient and model-agnostic multi-turn messages tokenization and masking volcengine/verl#1668
- Decoupling max_response_length, max_model_len, max_new_tokens
- Support
parse_args
api in tool, to custom tool parse parameters logic. - Decoupling loss_mask and response_mask, and redesign this part of logic.
Trouble Shooting Tacker
issue | status | reason | pr | owner |
---|---|---|---|---|
sglang offloading not work volcengine/verl#1545 | pending to verify | |||
Recent reprod fail volcengine/verl#1037 (comment) | pending to verify |
feifeibear, jhinpan, wapleeeeee, zhaochenyang20, 99hgz and 20 more