Skip to content

Conversation

zyzshishui
Copy link
Collaborator

@zyzshishui zyzshishui commented Apr 4, 2025

Description

Current Status

What Has Been Done

  • Support multi-turn conversations in RL training.
  • The feature has been tested and validated in internal, closed-source coder sandbox. (Note: This sandbox is not yet available for open-source use.)
  • Multiturn support has been implemented for the GSM8K task. However, training has not yet converged, and we hope the community could join to investigate the issue.

What Is Pending

  • We are going to validate our multiturn feature in open-source sandbox environments.
  • The current implementation only achieve basic functionalities. Ongoing refactoring aims to improve extensibility and support for user-defined tasks.

Refactoring Update

The refactoring of the multiturn feature is well underway and expected to be completed by the end of this week. Key changes include:

  • Extensibility: Adopt a modular design to improve flexibility and support user-define tasks.
  • Tool Support: Standardize and centralizes tool management for multi-turn interactions, streamlining schema generation, session setup, execution, reward calculation, and resource cleanup.
  • Async Request Management: Encapsulate rollout requests into a unified structure, enabling efficient tracking and handling of concurrent multi-turn dialogues.
  • Async Rollout Refactoring: Integrate with the tool server to coordinate tool calls during generation, leveraging request IDs for state and progress tracking.

Further Refactor Plan

The next phases of the refactoring effort will build upon the current foundation to support more advanced rollout scenarios:

  • Integrate a Ray-based agent trainer to enable explicit separation of the rollout and training pipeline. Provide support for partial rollout handling and fine-grained request state management.
  • Extend the framework to support simulated user interactions (e.g., roleplay, interactive feedback) and more complex environment-in-the-loop RL tasks.

@CLAassistant
Copy link

CLAassistant commented Apr 4, 2025

CLA assistant check
All committers have signed the CLA.

@zyzshishui zyzshishui force-pushed the wip/multiturn branch 2 times, most recently from dd0017f to 64c458c Compare April 5, 2025 20:25
@zyzshishui zyzshishui force-pushed the wip/multiturn branch 4 times, most recently from 56fa926 to a67c247 Compare April 8, 2025 19:01
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments to this yaml. Tell others this is ready to use but the validation score hasn't been improved. Welcome others to contribute and improve the algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, chenyang, for "hasn't been improved", what's the baseline here? Thx

index_start=val.index_start,
index_end=val.index_end,
)
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif "multi-turn":

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state the usage of this file at the beginning

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state the usage of this file at the beginning

zyzshishui and others added 3 commits April 10, 2025 19:15
Co-authored-by: Hanchen Zhang <zhanghanchen77@gmail.com>
Co-authored-by: Rui Lu <learningrate1@gmail.com>
Co-authored-by: Haoran Wang <ubecwang@gmail.com>
Co-authored-by: Yujiang Li <liyujiang2020@gmail.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: Jiajun Li <guapisolo@gmail.com>
@zyzshishui zyzshishui closed this Apr 11, 2025
eric-haibin-lin added a commit that referenced this pull request Apr 29, 2025
…1037)

A redesigned version of #917 

## Current Status
[Develop log &
Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113)

**What Has Been Done**
- Async Rollout Refactoring: Integrate with the tool server to
coordinate tool calls during generation, leveraging request IDs for
state and progress tracking, support async multi-turn conversations in
Agentic RL training (with Tool support).
- Async Request Management: Encapsulate rollout requests into a unified
structure, enabling efficient tracking and handling of concurrent
multi-turn dialogues with chatml style messages.
- Extensible Tools: A modular design for adapt tools in
OpenAIFunctionTool format which is both support by SGLang and vLLM, with
create separate instance, execute when tool call, calc score according
to tool env state and release resource.
- Multi-turn support has been implemented for the GSM8K task (new
version working on). However, training has not yet converged, and we
hope the community could join to investigate the issue.

**What Is WIP**
- [x] Merge loss mask to training process from last version
- [x] Add more user friendly tool config and e2e tests for gsm8k with
tool training
- [ ] We are going to validate our multiturn feature in open-source
sandbox environments.

## Key Features will be introduced in future version

- Integrate a Ray-based agent trainer to enable explicit separation of
the rollout and training pipeline. Provide support for partial rollout
handling and fine-grained request state management.
- Extend the framework to support simulated user interactions (e.g.,
roleplay, interactive feedback) and more complex environment-in-the-loop
RL tasks.

**Future Plan**
[Discussion
Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment))
[RFC
doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md)
will be updated soon.

## Contributors & Acknowledgement

- Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com)
@SwordFaith (Design RFC & core-dev of refactor part)
- Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com)
@zyzshishui (Core-dev)
- Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com)
@zhaochenyang20 (PM)
- Guanhua Wang @WANG-GH 
- Junrong Lin @ocss884 (verl-sglang support)
- Hanchen Zhang
[zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com)
- Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com)
- Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com)
- Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com)
- Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com)
- Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu)
- Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn)
@zh-zheng

---------

Co-authored-by: zyzshishui <492129152@qq.com>
Co-authored-by: guanhua <281484683@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
Co-authored-by: HL <linhaibin.eric@gmail.com>
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Apr 30, 2025
…olcengine#1037)

A redesigned version of volcengine#917 

## Current Status
[Develop log &
Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113)

**What Has Been Done**
- Async Rollout Refactoring: Integrate with the tool server to
coordinate tool calls during generation, leveraging request IDs for
state and progress tracking, support async multi-turn conversations in
Agentic RL training (with Tool support).
- Async Request Management: Encapsulate rollout requests into a unified
structure, enabling efficient tracking and handling of concurrent
multi-turn dialogues with chatml style messages.
- Extensible Tools: A modular design for adapt tools in
OpenAIFunctionTool format which is both support by SGLang and vLLM, with
create separate instance, execute when tool call, calc score according
to tool env state and release resource.
- Multi-turn support has been implemented for the GSM8K task (new
version working on). However, training has not yet converged, and we
hope the community could join to investigate the issue.

**What Is WIP**
- [x] Merge loss mask to training process from last version
- [x] Add more user friendly tool config and e2e tests for gsm8k with
tool training
- [ ] We are going to validate our multiturn feature in open-source
sandbox environments.

## Key Features will be introduced in future version

- Integrate a Ray-based agent trainer to enable explicit separation of
the rollout and training pipeline. Provide support for partial rollout
handling and fine-grained request state management.
- Extend the framework to support simulated user interactions (e.g.,
roleplay, interactive feedback) and more complex environment-in-the-loop
RL tasks.

**Future Plan**
[Discussion
Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment))
[RFC
doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md)
will be updated soon.

## Contributors & Acknowledgement

- Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com)
@SwordFaith (Design RFC & core-dev of refactor part)
- Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com)
@zyzshishui (Core-dev)
- Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com)
@zhaochenyang20 (PM)
- Guanhua Wang @WANG-GH 
- Junrong Lin @ocss884 (verl-sglang support)
- Hanchen Zhang
[zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com)
- Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com)
- Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com)
- Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com)
- Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com)
- Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu)
- Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn)
@zh-zheng

---------

Co-authored-by: zyzshishui <492129152@qq.com>
Co-authored-by: guanhua <281484683@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
Co-authored-by: HL <linhaibin.eric@gmail.com>
ScottCTD pushed a commit to ScottCTD/verl that referenced this pull request May 5, 2025
…olcengine#1037)

A redesigned version of volcengine#917 

## Current Status
[Develop log &
Tracker](zhaochenyang20/Awesome-ML-SYS-Tutorial#113)

**What Has Been Done**
- Async Rollout Refactoring: Integrate with the tool server to
coordinate tool calls during generation, leveraging request IDs for
state and progress tracking, support async multi-turn conversations in
Agentic RL training (with Tool support).
- Async Request Management: Encapsulate rollout requests into a unified
structure, enabling efficient tracking and handling of concurrent
multi-turn dialogues with chatml style messages.
- Extensible Tools: A modular design for adapt tools in
OpenAIFunctionTool format which is both support by SGLang and vLLM, with
create separate instance, execute when tool call, calc score according
to tool env state and release resource.
- Multi-turn support has been implemented for the GSM8K task (new
version working on). However, training has not yet converged, and we
hope the community could join to investigate the issue.

**What Is WIP**
- [x] Merge loss mask to training process from last version
- [x] Add more user friendly tool config and e2e tests for gsm8k with
tool training
- [ ] We are going to validate our multiturn feature in open-source
sandbox environments.

## Key Features will be introduced in future version

- Integrate a Ray-based agent trainer to enable explicit separation of
the rollout and training pipeline. Provide support for partial rollout
handling and fine-grained request state management.
- Extend the framework to support simulated user interactions (e.g.,
roleplay, interactive feedback) and more complex environment-in-the-loop
RL tasks.

**Future Plan**
[Discussion
Thread](zhaochenyang20/Awesome-ML-SYS-Tutorial#74 (comment))
[RFC
doc](https://github.com/SwordFaith/verl-sglang-dev-log/blob/main/rlhf/verl/multi-turn/veRL-multiturn-rollout-RFC.md)
will be updated soon.

## Contributors & Acknowledgement

- Xiang Long [mid.of.change@gmail.com](mailto:mid.of.change@gmail.com)
@SwordFaith (Design RFC & core-dev of refactor part)
- Yuzhen Zhou [zyzshishui@gmail.com](mailto:zyzshishui@gmail.com)
@zyzshishui (Core-dev)
- Chenyang Zhao [zhaochen20@outlook.com](mailto:zhaochen20@outlook.com)
@zhaochenyang20 (PM)
- Guanhua Wang @WANG-GH 
- Junrong Lin @ocss884 (verl-sglang support)
- Hanchen Zhang
[zhanghanchen77@gmail.com](mailto:zhanghanchen77@gmail.com)
- Haoran Wang [ubecwang@gmail.com](mailto:ubecwang@gmail.com)
- Rui Lu [learningrate1@gmail.com](mailto:learningrate1@gmail.com)
- Yujiang Li [liyujiang2020@gmail.com](mailto:liyujiang2020@gmail.com)
- Jiajun Li [guapisolo@gmail.com](mailto:guapisolo@gmail.com)
- Jin Pan [jpan236@wisc.edu](mailto:jpan236@wisc.edu)
- Zhi Zheng [zhengzhi@modelbest.cn](mailto:zhengzhi@modelbest.cn)
@zh-zheng

---------

Co-authored-by: zyzshishui <492129152@qq.com>
Co-authored-by: guanhua <281484683@qq.com>
Co-authored-by: zhaochenyang20 <zhaochen20@outlook.com>
Co-authored-by: ocss884 <ocss.lin@gmail.com>
Co-authored-by: Shawn/Yuxuan Tong <tongyuxuan361@gmail.com>
Co-authored-by: HL <linhaibin.eric@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants