Skip to content

Conversation

ccw1996
Copy link
Contributor

@ccw1996 ccw1996 commented Jan 8, 2025

Motivation

Add Deepseek-VL2 model to SGLang, as requested in #2653

Modifications

  1. Add new model Deepseek-VL2

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@zhyncs zhyncs added the enhancement New feature or request label Jan 8, 2025
@@ -0,0 +1,127 @@
from typing import List,Optional,Tuple,Union
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the file to deepseek_vl2?

Copy link
Contributor Author

@ccw1996 ccw1996 Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename done


self.layers = modules

def forward(self, x):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet implemented the forward part of the DeepseekV2ForCausalLM. I will finish all the implementations and add the unit test this weekend.

@yizhang2077
Copy link
Collaborator

@ccw1996 Do you need our help?

@SashvDave
Copy link

SashvDave commented Jan 29, 2025

Has support for deepseek vl2 been implemented?

Comment on lines 220 to 229
if config.projector_type == "downsample_mlp_gelu":
mlp_depth = config.depth
mlp_ratio = config.mlp_ratio
modules = [nn.Linear(config.input_dim * config.downsample_ratio * config.downsample_ratio, config.n_embed * mlp_ratio)]
for _ in range(1, mlp_depth - 1):
modules.append(nn.GELU())
modules.append(nn.Linear(config.n_embed * mlp_ratio, config.n_embed * mlp_ratio))
modules.append(nn.GELU())
modules.append(nn.Linear(config.n_embed * mlp_ratio, config.n_embed))
modules = nn.Sequential(*modules)
Copy link
Contributor

@Edenzzzz Edenzzzz Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccw1996 I'm happy to take the rest of the work to parallelize the remaining functions. Could you give me access to your branch?

@Edenzzzz
Copy link
Contributor

@ccw1996 Apologies for the delay. Would you like me to help with the rest of it?

@ccw1996
Copy link
Contributor Author

ccw1996 commented Jan 31, 2025

@ccw1996 Do you need our help?

@ccw1996 Apologies for the delay. Would you like me to help with the rest of it?

sure, i have some trouble about adapting preprocess. i need help to adapt siglip implement without timm

i will update my other implement code later

@Edenzzzz
Copy link
Contributor

Edenzzzz commented Jan 31, 2025

@ccw1996 I see, I think you can copy those layers from timm into python/sglang/srt/models/deepseekvl2.py, and then replace layers with sgl classes. I'm interested in helping if you can give me access.

@Edenzzzz
Copy link
Contributor

Edenzzzz commented Feb 4, 2025

@yizhang2077 @ispobock Looks like we'll have to copy lots of code from timm--now mostly just the linear layers with variable depth to parallelize, will finish soon

@ccw1996
Copy link
Contributor Author

ccw1996 commented Feb 6, 2025

@ccw1996 Apologies for the delay. Would you like me to help with the rest of it?

@Edenzzzz hello, i have run deepseekvl2 success with timm preprocess, but i am confused that result have some unexpected value. Can you help me find out the reason?

@Edenzzzz
Copy link
Contributor

Edenzzzz commented Feb 6, 2025

@Edenzzzz hello, i have run deepseekvl2 success with timm preprocess, but i am confused that result have some unexpected value. Can you help me find out the reason?

Sure, can you mark the problematic part?

Comment on lines 640 to 659
if config.projector_type == "downsample_mlp_gelu":
mlp_depth = config.depth
mlp_ratio = config.mlp_ratio
modules = [
nn.Linear(
config.input_dim
* config.downsample_ratio
* config.downsample_ratio,
config.n_embed * mlp_ratio,
)
]
for _ in range(1, mlp_depth - 1):
modules.append(nn.GELU())
modules.append(
nn.Linear(config.n_embed * mlp_ratio, config.n_embed * mlp_ratio)
)
modules.append(nn.GELU())
modules.append(nn.Linear(config.n_embed * mlp_ratio, config.n_embed))
modules = nn.Sequential(*modules)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to parallelize this part with Column and Row linear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yizhang2077 Actually with GELU we'll have to gather output for each TP linear. Should we use replicated linear instead?

@ccw1996
Copy link
Contributor Author

ccw1996 commented Feb 9, 2025

@Edenzzzz hello, i have run deepseekvl2 success with timm preprocess, but i am confused that result have some unexpected value. Can you help me find out the reason?
Sure, can you mark the problematic part?

two problem. one is radix cache will make input error, i will try to fix it. the second is output seems like not use images embedding. Can you help me to debug it?

@Edenzzzz
Copy link
Contributor

Edenzzzz commented Feb 9, 2025

Let me try tomorrow

Comment on lines 157 to 161
logger.info(
"Automatically turn off --chunked-prefill-size and disable radix cache for deekseek-vl2."
)
server_args.chunked_prefill_size = -1
server_args.disable_radix_cache = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language part still supports radix cache.

Copy link
Contributor Author

@ccw1996 ccw1996 Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language part relay on input embed. If use radix cache, the input embed is wrong. I will try to debug it.

Copy link
Contributor

@Edenzzzz Edenzzzz Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I think you're right. Llava and qwen_vl also don't use radix attn

@zhaochenyang20
Copy link
Collaborator

I think some parts of this file just move codes from timm. Can we try import timm and guide user to install it when user use this model (vllm use this way)? I think this will make code more clear. Do you think it is ok? @zhaochenyang20 (In this way ci will be blocked so we need update ci denpendency shell)

Has this been done? @ccw1996 @yizhang2077 basically, do not paste codes but rather import it when needed

@ccw1996
Copy link
Contributor Author

ccw1996 commented Mar 15, 2025

I think some parts of this file just move codes from timm. Can we try import timm and guide user to install it when user use this model (vllm use this way)? I think this will make code more clear. Do you think it is ok? @zhaochenyang20 (In this way ci will be blocked so we need update ci denpendency shell)我认为这个文件的某些部分只是从 timm 移动了代码。我们可以尝试导入 timm 并在用户使用此模型时指导用户安装它(vllm 使用这种方式)吗?我认为这将使代码更加清晰。你觉得没问题吗?@zhaochenyang20 (这样 ci 会被阻止,所以我们需要更新 ci denpendency shell)

Has this been done? @ccw1996 @yizhang2077 basically, do not paste codes but rather import it when needed这样做了吗?@ccw1996 @yizhang2077 基本上,不要粘贴代码,而是在需要时导入代码

Sorry, i missed this comment.
However, if timm implement is changed, there is a risk to the accuracy of the results. I think it's better now. What do you think? @zhaochenyang20 @yizhang2077
Beside, i will try to import timm as soon as possible.

@Edenzzzz
Copy link
Contributor

I initially thought it's unnecessary to add a dependency just for one model, cuz you'll also have to update CI😂but it's fine if you guys agree

@zhaochenyang20
Copy link
Collaborator

I think some parts of this file just move codes from timm. Can we try import timm and guide user to install it when user use this model (vllm use this way)? I think this will make code more clear. Do you think it is ok? @zhaochenyang20 (In this way ci will be blocked so we need update ci denpendency shell)我认为这个文件的某些部分只是从 timm 移动了代码。我们可以尝试导入 timm 并在用户使用此模型时指导用户安装它(vllm 使用这种方式)吗?我认为这将使代码更加清晰。你觉得没问题吗?@zhaochenyang20 (这样 ci 会被阻止,所以我们需要更新 ci denpendency shell)

Has this been done? @ccw1996 @yizhang2077 basically, do not paste codes but rather import it when needed这样做了吗?@ccw1996 @yizhang2077 基本上,不要粘贴代码,而是在需要时导入代码

Sorry, i missed this comment. However, if timm implement is changed, there is a risk to the accuracy of the results. I think it's better now. What do you think? @zhaochenyang20 @yizhang2077 Beside, i will try to import timm as soon as possible.

i think better import it. timm should keep their utility, we should not put dead codes in sglang.

@yizhang2077
Copy link
Collaborator

hi @ccw1996, once you pr is ready, I think you can update here #4456

@ccw1996
Copy link
Contributor Author

ccw1996 commented Mar 16, 2025

I think some parts of this file just move codes from timm. Can we try import timm and guide user to install it when user use this model (vllm use this way)? I think this will make code more clear. Do you think it is ok? @zhaochenyang20 (In this way ci will be blocked so we need update ci denpendency shell)我认为这个文件的某些部分只是从 timm 移动了代码。我们可以尝试导入 timm 并在用户使用此模型时指导用户安装它(vllm 使用这种方式)吗?我认为这将使代码更加清晰。你觉得没问题吗?@zhaochenyang20 (这样 ci 会被阻止,所以我们需要更新 ci denpendency shell)

Has this been done? @ccw1996 @yizhang2077 basically, do not paste codes but rather import it when needed这样做了吗?@ccw1996 @yizhang2077 基本上,不要粘贴代码,而是在需要时导入代码

Sorry, i missed this comment. However, if timm implement is changed, there is a risk to the accuracy of the results. I think it's better now. What do you think? @zhaochenyang20 @yizhang2077 Beside, i will try to import timm as soon as possible.

i think better import it. timm should keep their utility, we should not put dead codes in sglang.

@zhaochenyang20 @yizhang2077 done. Does i need to update branch?

@ccw1996
Copy link
Contributor Author

ccw1996 commented Mar 16, 2025

hi @ccw1996, once you pr is ready, I think you can update here #4456

@yizhang2077 i can not update this sheet

@yizhang2077
Copy link
Collaborator

@yizhang2077 i can not update this sheet

You can paste result here, and then I update it

@ccw1996
Copy link
Contributor Author

ccw1996 commented Mar 16, 2025

@yizhang2077 i can not update this sheet

You can paste result here, and then I update it

sglang is 0.442 and hf is not support

@zhaochenyang20
Copy link
Collaborator

@yizhang2077 @ccw1996 what shall we do next? I have the access and I can merge after yi's approval.

@ccw1996 ccw1996 requested a review from xiezhq-hermann as a code owner March 17, 2025 04:39
@ccw1996
Copy link
Contributor Author

ccw1996 commented Mar 17, 2025

@yizhang2077 @ccw1996 what shall we do next? I have the access and I can merge after yi's approval.

@zhaochenyang20 @yizhang2077 i resolve merge conflict about gemma3. now waiting for approval.

@zhaochenyang20
Copy link
Collaborator

sure!

@zhaochenyang20
Copy link
Collaborator

@yizhang2077 @mickqian could you help to give a quick overview?

@zhaochenyang20
Copy link
Collaborator

@ccw1996 fix lint plz

@mickqian
Copy link
Collaborator

LGTM

@zhaochenyang20
Copy link
Collaborator

@ccw1996 sure. I will run the CI for you and do not rebase any more. leave it to us

@yizhang2077
Copy link
Collaborator

@ccw1996 please add pip install timm in ci_install_dependency.sh, otherwise your test will not pass.

@zhaochenyang20
Copy link
Collaborator

@ccw1996 please add pip install timm in ci_install_dependency.sh, otherwise your test will not pass.

I've done it

@zhaochenyang20 zhaochenyang20 merged commit d6d2164 into sgl-project:main Mar 17, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants