Multi-turn rollout Update #2

- [ ] Add real world tools support (at least Search & Code Interpreter)
  - [ ] Add global rate limit at tool-level (Haiquan Chen @ Bytedance volcengine is working on it)
  - [ ] Code Interpreter (set ReTool like workload as example)
    - [x] ReTool like cold start SFT data https://huggingface.co/datasets/swordfaith/ReTool-SFT-multi-turn
    - [ ] Qwen3 no-think SFT model
      - [x] Qwen3-4B https://huggingface.co/swordfaith/ReTool-Qwen3-4B-SFT-cold-started
    - [ ] sandbox fusion based code interpreter impl
      - [x] First version impl (Xiaocheng Wang @ Bytedance volcengine is working on it) https://github.com/volcengine/verl/pull/1525
      - [ ] Support global rate limit & req queue
      - [ ] Support global registry and seperate resources pool
    - [ ] ReTool like RL example & wandb log
  - [ ] Search-R1 like RL (Ling Chang @ Baidu & CAS is working on it)
    - [x]  First run with sglang 0.4.5.post3
    - [ ] check if it stable and result right
- [ ] Add Server-based Rollout
  - [ ] Add http engine support in async rollout
  - [ ] Add http engine replace http verl engine
  - [ ] Port sgl-router as a ray actor in verl
  - [ ] Add register http_engine to router logic
  - [ ] Refactor to common ray actor registry impl
  - [ ] Discuss if it could be ray native grpc service register and dispatch
- [ ] Add Qwen3 training example @SwordFaith 
- [x] Add multi-node support (Shenggui Li & Jin Pan @ SGLang verified it), [trouble shooting doc](https://occw56ckam.feishu.cn/docx/IauNdDv9UoIiUGxTPzvcP74snsc?from=from_copylink)
- [ ] Add VLM support (geo3k + examples) #137 
- [x] Add Megatron support @SwordFaith https://github.com/volcengine/verl/pull/1602
- [ ] Refactor sglang in verl
  - [x] init & generate_sequences impl align @ocss884 @SwordFaith https://github.com/SwordFaith/verl/tree/refactor/merge_sgl_rollouts_and_bump_to_0.4.6.post4
  - [x] sharding manager + rollout 合并验证，将所有 sglang 单测切换到 sglang_async 测试，环境 torch 2.6 + sglang 0.4.6.post4 (Yuzhen Zhou & Jin Pan @ SGLang help test on this)
     - [x] e2e_ppo_trainer_sglang
     - [x] e2e_ppo_trainer_sglang_async
     - [x] e2e_ppo_trainer_sglang_async_with_tool
     - [x] e2e_ppo_trainer_sglang_vlm
     - [x] e2e_ppo_trainer_megatron-qwen + megatron-core 0.12.0 @SwordFaith 
     - [x] sgl.yml
     - [x] gsm8k regression training sglang & sglang_async @ocss884 
     - [x] geo3k regression training sglang & sglang_async @GeLee-Q 
     - [x] gsm8k with tool @SwordFaith 
  - [ ] sglang rollout 和 sharding manager switch to sglang_async as default
     - [x] megatron verify w generate_sequences_with_tools @SwordFaith 
  - [x] Update CI & unit-tests
  - [x] Update requirements
  - [x] bump to 0.4.6.post4 https://github.com/volcengine/verl/pull/1577
  - [x] fix megatron support
  - [ ] merge sglang impl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-turn rollout Update #2 #132

Sub-issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Multi-turn rollout Update #2 #132

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions