[WIP] SGLang rollout multiturn support #917

zyzshishui · 2025-04-04T16:30:31Z

Description

Current Status

What Has Been Done

Support multi-turn conversations in RL training.
The feature has been tested and validated in internal, closed-source coder sandbox. (Note: This sandbox is not yet available for open-source use.)
Multiturn support has been implemented for the GSM8K task. However, training has not yet converged, and we hope the community could join to investigate the issue.

What Is Pending

We are going to validate our multiturn feature in open-source sandbox environments.
The current implementation only achieve basic functionalities. Ongoing refactoring aims to improve extensibility and support for user-defined tasks.

Refactoring Update

The refactoring of the multiturn feature is well underway and expected to be completed by the end of this week. Key changes include:

Extensibility: Adopt a modular design to improve flexibility and support user-define tasks.
Tool Support: Standardize and centralizes tool management for multi-turn interactions, streamlining schema generation, session setup, execution, reward calculation, and resource cleanup.
Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues.
Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking.

Further Refactor Plan

The next phases of the refactoring effort will build upon the current foundation to support more advanced rollout scenarios:

Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management.
Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks.

CLAassistant · 2025-04-04T16:30:40Z

All committers have signed the CLA.

zhaochenyang20 · 2025-04-08T20:56:57Z

verl/trainer/config/gsm8k.yaml

Add comments to this yaml. Tell others this is ready to use but the validation score hasn't been improved. Welcome others to contribute and improve the algorithm.

Hi, chenyang, for "hasn't been improved", what's the baseline here? Thx

verl/trainer/main_ppo.py

verl/trainer/ppo/core_algos.py

verl/trainer/ppo/ray_trainer.py

zhaochenyang20 · 2025-04-08T21:06:30Z

verl/trainer/ppo/ray_trainer.py

+                index_start=val.index_start,
+                index_end=val.index_end,
+            )
+        else:


elif "multi-turn":

zhaochenyang20 · 2025-04-08T21:25:02Z

verl/workers/agentic/loops.py

state the usage of this file at the beginning

zhaochenyang20 · 2025-04-08T21:25:07Z

verl/workers/agentic/tasks.py

state the usage of this file at the beginning

verl/workers/fsdp_workers.py

verl/workers/reward_manager/swedev.py

verl/utils/reward_score/hotpotqa.py

Co-authored-by: Hanchen Zhang <zhanghanchen77@gmail.com> Co-authored-by: Rui Lu <learningrate1@gmail.com> Co-authored-by: Haoran Wang <ubecwang@gmail.com> Co-authored-by: Yujiang Li <liyujiang2020@gmail.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>

Co-authored-by: Jiajun Li <guapisolo@gmail.com>

@SwordFaith

…1037) A redesigned version of #917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com) @zyzshishui (Core-dev) - Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com) - Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com) - Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com) - Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com) - Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com) - Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu) - Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn) @zh-zheng --------- Co-authored-by: zyzshishui <492129152@qq.com> Co-authored-by: guanhua <281484683@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com> Co-authored-by: HL <linhaibin.eric@gmail.com>

@SwordFaith

…olcengine#1037) A redesigned version of volcengine#917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com) @zyzshishui (Core-dev) - Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com) - Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com) - Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com) - Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com) - Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com) - Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu) - Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn) @zh-zheng --------- Co-authored-by: zyzshishui <492129152@qq.com> Co-authored-by: guanhua <281484683@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com> Co-authored-by: HL <linhaibin.eric@gmail.com>

@SwordFaith

…olcengine#1037) A redesigned version of volcengine#917 ## Current Status [Develop log & Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113) **What Has Been Done** - Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking, support async multi-turn conversations in Agentic RL training (with Tool support). - Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues with chatml style messages. - Extensible Tools: A modular design for adapt tools in OpenAIFunctionTool format which is both support by SGLang and vLLM, with create separate instance, execute when tool call, calc score according to tool env state and release resource. - Multi-turn support has been implemented for the GSM8K task (new version working on). However, training has not yet converged, and we hope the community could join to investigate the issue. **What Is WIP** - [x] Merge loss mask to training process from last version - [x] Add more user friendly tool config and e2e tests for gsm8k with tool training - [ ] We are going to validate our multiturn feature in open-source sandbox environments. ## Key Features will be introduced in future version - Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management. - Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks. **Future Plan** [Discussion Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment)) [RFC doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md) will be updated soon. ## Contributors & Acknowledgement - Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com) @SwordFaith (Design RFC & core-dev of refactor part) - Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com) @zyzshishui (Core-dev) - Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com) @zhaochenyang20 (PM) - Guanhua Wang @WANG-GH - Junrong Lin @ocss884 (verl-sglang support) - Hanchen Zhang [zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com) - Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com) - Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com) - Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com) - Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com) - Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu) - Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn) @zh-zheng --------- Co-authored-by: zyzshishui <492129152@qq.com> Co-authored-by: guanhua <281484683@qq.com> Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com> Co-authored-by: ocss884 <ocss.lin@gmail.com> Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com> Co-authored-by: HL <linhaibin.eric@gmail.com>

zyzshishui mentioned this pull request Apr 5, 2025

veRL-SGLang Roadmap zhaochenyang20/Awesome-ML-SYS-Tutorial#74

Open

13 tasks

zyzshishui force-pushed the wip/multiturn branch 2 times, most recently from dd0017f to 64c458c Compare April 5, 2025 20:25

eric-haibin-lin mentioned this pull request Apr 6, 2025

Support for mutliturn online RL training #385

Closed

zyzshishui force-pushed the wip/multiturn branch 4 times, most recently from 56fa926 to a67c247 Compare April 8, 2025 19:01

zhaochenyang20 requested changes Apr 8, 2025

View reviewed changes

UbeCc reviewed Apr 9, 2025

View reviewed changes

verl/utils/reward_score/hotpotqa.py Outdated Show resolved Hide resolved

zyzshishui and others added 3 commits April 10, 2025 19:15

add gsm8k config file

8365f98

[ready to review] rebase with main

0dd7ce8

zyzshishui force-pushed the wip/multiturn branch from 339102f to 0dd7ce8 Compare April 10, 2025 19:22

zhaochenyang20 closed this Apr 11, 2025

rebase with main [left confilcts]

16d1550

sunjin-k mentioned this pull request Apr 11, 2025

[WIP] Agent Training with Remote Service + Gym like protocal #973

Draft

zhaochenyang20 reopened this Apr 11, 2025

zhaochenyang20 mentioned this pull request Apr 11, 2025

Multi-turn rollout Update #1 zhaochenyang20/Awesome-ML-SYS-Tutorial#113

Open

53 tasks

zyzshishui force-pushed the wip/multiturn branch from f2d0992 to c7d0f20 Compare April 11, 2025 08:14

zyzshishui closed this Apr 11, 2025

zyzshishui reopened this Apr 11, 2025

zyzshishui force-pushed the wip/multiturn branch from c7d0f20 to 16d1550 Compare April 11, 2025 08:43

resolve conflicts

31f39fe

Co-authored-by: Jiajun Li <guapisolo@gmail.com>

zyzshishui force-pushed the wip/multiturn branch from a794fb1 to 31f39fe Compare April 11, 2025 09:16

zyzshishui closed this Apr 11, 2025

SwordFaith mentioned this pull request Apr 11, 2025

[sglang] feat: Add SGLang async multi-turn rollout with tool support #1037

Merged

3 tasks

caoshiyi mentioned this pull request May 15, 2025

Which commit of verl is this repo based on? NovaSky-AI/SkyRL#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] SGLang rollout multiturn support #917

[WIP] SGLang rollout multiturn support #917

Uh oh!

zyzshishui commented Apr 4, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Apr 4, 2025 •

edited

Loading

Uh oh!

zhaochenyang20 Apr 8, 2025

Uh oh!

SparkJiao Apr 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 Apr 8, 2025

Uh oh!

zhaochenyang20 Apr 8, 2025

Uh oh!

zhaochenyang20 Apr 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[WIP] SGLang rollout multiturn support #917

[WIP] SGLang rollout multiturn support #917

Uh oh!

Conversation

zyzshishui commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Current Status

What Has Been Done

What Is Pending

Refactoring Update

Further Refactor Plan

Uh oh!

CLAassistant commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhaochenyang20 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

SparkJiao Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zyzshishui commented Apr 4, 2025 •

edited

Loading

CLAassistant commented Apr 4, 2025 •

edited

Loading