-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[sglang] feat: add multimodal input to multiturn async rollout #2014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sglang] feat: add multimodal input to multiturn async rollout #2014
Conversation
2b560a9
to
e0d1bf2
Compare
@nanjiangwill Is it ready for us to test it in public? |
yes it is ready for public testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking forward to your reply and the opportunity to discuss this further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazon/Amazing job 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Nan
…ngine#2014) ### Checklist Before Starting - [X] Searched for similar PR(s). ### What does this PR do? This PR adds image input to sglang async rollout. Previously sglang async rollout only support text. There is also a placeholder for video data, will be added as an input when SGLang engine supports it. ### High-Level Design Since sglang engine already handle the image input, just need to properly handling the tokenization. ### Specific Changes Change `self.tokenizer.apply_chat_template()` to `self.processing_class.apply_chat_template()`. `processing_class` could be `tokenizer` or `processor`. ### Usage Example It will automatically using processor to process image when the model's processor supports that. It will use tokenizer if there is no processor available ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] New CI unit test(s) are added to cover the code path. - [X] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: xieck13 <xieck13@gmail.com>
…ngine#2014) ### Checklist Before Starting - [X] Searched for similar PR(s). ### What does this PR do? This PR adds image input to sglang async rollout. Previously sglang async rollout only support text. There is also a placeholder for video data, will be added as an input when SGLang engine supports it. ### High-Level Design Since sglang engine already handle the image input, just need to properly handling the tokenization. ### Specific Changes Change `self.tokenizer.apply_chat_template()` to `self.processing_class.apply_chat_template()`. `processing_class` could be `tokenizer` or `processor`. ### Usage Example It will automatically using processor to process image when the model's processor supports that. It will use tokenizer if there is no processor available ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] New CI unit test(s) are added to cover the code path. - [X] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: xieck13 <xieck13@gmail.com>
…ngine#2014) ### Checklist Before Starting - [X] Searched for similar PR(s). ### What does this PR do? This PR adds image input to sglang async rollout. Previously sglang async rollout only support text. There is also a placeholder for video data, will be added as an input when SGLang engine supports it. ### High-Level Design Since sglang engine already handle the image input, just need to properly handling the tokenization. ### Specific Changes Change `self.tokenizer.apply_chat_template()` to `self.processing_class.apply_chat_template()`. `processing_class` could be `tokenizer` or `processor`. ### Usage Example It will automatically using processor to process image when the model's processor supports that. It will use tokenizer if there is no processor available ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] New CI unit test(s) are added to cover the code path. - [X] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: xieck13 <xieck13@gmail.com>
…ngine#2014) ### Checklist Before Starting - [X] Searched for similar PR(s). ### What does this PR do? This PR adds image input to sglang async rollout. Previously sglang async rollout only support text. There is also a placeholder for video data, will be added as an input when SGLang engine supports it. ### High-Level Design Since sglang engine already handle the image input, just need to properly handling the tokenization. ### Specific Changes Change `self.tokenizer.apply_chat_template()` to `self.processing_class.apply_chat_template()`. `processing_class` could be `tokenizer` or `processor`. ### Usage Example It will automatically using processor to process image when the model's processor supports that. It will use tokenizer if there is no processor available ### Checklist Before Submitting - [X] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [X] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [X] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [X] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [X] New CI unit test(s) are added to cover the code path. - [X] Rely on existing unit tests on CI that covers the code path. --------- Co-authored-by: xieck13 <xieck13@gmail.com>
Checklist Before Starting
What does this PR do?
This PR adds image input to sglang async rollout. Previously sglang async rollout only support text. There is also a placeholder for video data, will be added as an input when SGLang engine supports it.
High-Level Design
Since sglang engine already handle the image input, just need to properly handling the tokenization.
Specific Changes
Change
self.tokenizer.apply_chat_template()
toself.processing_class.apply_chat_template()
.processing_class
could betokenizer
orprocessor
.Usage Example
It will automatically using processor to process image when the model's processor supports that. It will use tokenizer if there is no processor available
Checklist Before Submitting
[BREAKING]
to the PR titledescription
if it breaks any API.