[sglang] feat: add multimodal input to multiturn async rollout #2014

nanjiangwill · 2025-06-13T16:44:51Z

Checklist Before Starting

Searched for similar PR(s).

What does this PR do?

This PR adds image input to sglang async rollout. Previously sglang async rollout only support text. There is also a placeholder for video data, will be added as an input when SGLang engine supports it.

High-Level Design

Since sglang engine already handle the image input, just need to properly handling the tokenization.

Specific Changes

Change self.tokenizer.apply_chat_template() to self.processing_class.apply_chat_template(). processing_class could be tokenizer or processor.

Usage Example

It will automatically using processor to process image when the model's processor supports that. It will use tokenizer if there is no processor available

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks.
Add [BREAKING] to the PR title description if it breaks any API.
Update the documentation about your changes in the docs.
New CI unit test(s) are added to cover the code path.
Rely on existing unit tests on CI that covers the code path.

zhaochenyang20 · 2025-06-18T17:15:57Z

@nanjiangwill Is it ready for us to test it in public?

nanjiangwill · 2025-06-18T17:33:39Z

yes it is ready for public testing

SwordFaith

looking forward to your reply and the opportunity to discuss this further.

docs/sglang_multiturn/multiturn.rst

examples/sglang_multiturn/geo3k/run_qwen2.5-3b_geo3k_multiturn.sh

verl/workers/megatron_workers.py

verl/workers/rollout/schemas.py

zhaochenyang20

Amazon/Amazing job 😂

SwordFaith

Great job! Nan

…ngine#2014) ### Checklist Before Starting - [X] Searched for similar PR(s). ### What does this PR do? This PR adds image input to sglang async rollout. Previously sglang async rollout only support text. There is also a placeholder for video data, will be added as an input when SGLang engine supports it. ### High-Level Design Since sglang engine already handle the image input, just need to properly handling the tokenization. ### Specific Changes Change `self.tokenizer.apply_chat_template()` to `self.processing_class.apply_chat_template()`. `processing_class` could be `tokenizer` or `processor`. ### Usage Example It will automatically using processor to process image when the model's processor supports that. It will use tokenizer if there is no processor available ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] New CI unit test(s) are added to cover the code path. - [X] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: xieck13 <xieck13@gmail.com>

nanjiangwill and others added 10 commits June 13, 2025 09:31

feat: add vlm multiturn

bc3aea5

docs: update docs for multimodal input

7800f3d

lint: fix linting issue

8d0a672

feat: vlm async sglang support, adding files

6a15ad1

feat: adding geo3k tool

cc2a582

feat: adding geo3k tool

a2856eb

tests: add tests and github workflow

511b530

Merge branch 'main' into feat/add-multimodal-multiturn-sglang

2c1841e

tests: merge main and add tests

fab02ad

remove context manager in async

b20748f

xieck13 requested review from zhaochenyang20 and eric-haibin-lin as code owners June 15, 2025 11:01

nanjiangwill added 5 commits June 16, 2025 02:40

fix: change model to vl for tests

77f7ad7

fix: fix sglangrollout init arguments

9a34d11

fix: fix CI

0fa4823

docs: add info

e0809fd

docs: add info

ec1dd99

SwordFaith mentioned this pull request Jun 17, 2025

Multi-turn Update #2 VLM Support Tracker zhaochenyang20/Awesome-ML-SYS-Tutorial#137

Open

11 tasks

nanjiangwill added 2 commits June 18, 2025 10:01

feat: merge main

cd11b0a

docs: add copyright

e0d1bf2

nanjiangwill force-pushed the feat/add-multimodal-multiturn-sglang branch from 2b560a9 to e0d1bf2 Compare June 18, 2025 17:02

lint

43cab07

nanjiangwill added 5 commits June 18, 2025 11:32

docs: fix ci

f9ee592

fix: fix ci

af70de1

fix: fix multi image issue in tokenization

284c3be

fix: fix warning issue

51f17e7

fix: fix sglang ci

f1c73f4

nanjiangwill requested a review from chenhaiq as a code owner June 19, 2025 17:28

SwordFaith reviewed Jun 21, 2025

View reviewed changes

SwordFaith mentioned this pull request Jun 7, 2025

Multi-turn rollout & agentic RL Status & Roadmap zhaochenyang20/Awesome-ML-SYS-Tutorial#131

Open

27 tasks

nanjiangwill changed the title ~~[sglang] feat: [BREAKING] add multimodal input to multiturn async rollout~~ [BREAKING][sglang] feat: add multimodal input to multiturn async rollout Jun 21, 2025

lint

b25eb31

nanjiangwill requested review from zw0610 and wuxibin89 as code owners June 21, 2025 17:13

nanjiangwill added 5 commits June 21, 2025 10:31

add megatron chat template

cd3c6b6

fix: fix prompt

60b6062

Merge branch 'main' into feat/add-multimodal-multiturn-sglang

1b5ff43

fix: fix diff func

f4250e6

docs: update docs

6d95a82

zhaochenyang20 changed the title ~~[BREAKING][sglang] feat: add multimodal input to multiturn async rollout~~ [sglang] feat: add multimodal input to multiturn async rollout Jun 21, 2025

nanjiangwill added 5 commits June 21, 2025 14:53

change chat template logic

9ff5b18

change chat template logic

355d63d

change chat template logic

56e3945

fix template issue

48342f6

lint

98ac3f1

zhaochenyang20 approved these changes Jun 22, 2025

View reviewed changes

SwordFaith approved these changes Jun 22, 2025

View reviewed changes

nanjiangwill added 6 commits June 22, 2025 10:02

merge main

8f5d296

feat: update diff context variable name

c602960

fix

cb9c0af

fix: fix CI

aafabe5

fix: fix CI

e2dcc2f

feat: add megatron config

d756089

zhaochenyang20 merged commit 644aaa7 into volcengine:main Jun 22, 2025
43 of 44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[sglang] feat: add multimodal input to multiturn async rollout #2014

[sglang] feat: add multimodal input to multiturn async rollout #2014

Uh oh!

nanjiangwill commented Jun 13, 2025

Uh oh!

zhaochenyang20 commented Jun 18, 2025

Uh oh!

nanjiangwill commented Jun 18, 2025

Uh oh!

SwordFaith left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 left a comment

Uh oh!

SwordFaith left a comment

Uh oh!

Uh oh!

Uh oh!

[sglang] feat: add multimodal input to multiturn async rollout #2014

[sglang] feat: add multimodal input to multiturn async rollout #2014

Uh oh!

Conversation

nanjiangwill commented Jun 13, 2025

Checklist Before Starting

What does this PR do?

High-Level Design

Specific Changes

Usage Example

Checklist Before Submitting

Uh oh!

zhaochenyang20 commented Jun 18, 2025

Uh oh!

nanjiangwill commented Jun 18, 2025

Uh oh!

SwordFaith left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 left a comment

Choose a reason for hiding this comment

Uh oh!

SwordFaith left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!