-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[Feature] Support Deepseek-VL2 #2798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,127 @@ | |||
from typing import List,Optional,Tuple,Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename the file to deepseek_vl2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename done
|
||
self.layers = modules | ||
|
||
def forward(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not yet implemented the forward part of the DeepseekV2ForCausalLM. I will finish all the implementations and add the unit test this weekend.
@ccw1996 Do you need our help? |
Has support for deepseek vl2 been implemented? |
if config.projector_type == "downsample_mlp_gelu": | ||
mlp_depth = config.depth | ||
mlp_ratio = config.mlp_ratio | ||
modules = [nn.Linear(config.input_dim * config.downsample_ratio * config.downsample_ratio, config.n_embed * mlp_ratio)] | ||
for _ in range(1, mlp_depth - 1): | ||
modules.append(nn.GELU()) | ||
modules.append(nn.Linear(config.n_embed * mlp_ratio, config.n_embed * mlp_ratio)) | ||
modules.append(nn.GELU()) | ||
modules.append(nn.Linear(config.n_embed * mlp_ratio, config.n_embed)) | ||
modules = nn.Sequential(*modules) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ccw1996 I'm happy to take the rest of the work to parallelize the remaining functions. Could you give me access to your branch?
@ccw1996 Apologies for the delay. Would you like me to help with the rest of it? |
@ccw1996 I see, I think you can copy those layers from timm into python/sglang/srt/models/deepseekvl2.py, and then replace layers with sgl classes. I'm interested in helping if you can give me access. |
@yizhang2077 @ispobock Looks like we'll have to copy lots of code from timm--now mostly just the linear layers with variable depth to parallelize, will finish soon |
Sure, can you mark the problematic part? |
if config.projector_type == "downsample_mlp_gelu": | ||
mlp_depth = config.depth | ||
mlp_ratio = config.mlp_ratio | ||
modules = [ | ||
nn.Linear( | ||
config.input_dim | ||
* config.downsample_ratio | ||
* config.downsample_ratio, | ||
config.n_embed * mlp_ratio, | ||
) | ||
] | ||
for _ in range(1, mlp_depth - 1): | ||
modules.append(nn.GELU()) | ||
modules.append( | ||
nn.Linear(config.n_embed * mlp_ratio, config.n_embed * mlp_ratio) | ||
) | ||
modules.append(nn.GELU()) | ||
modules.append(nn.Linear(config.n_embed * mlp_ratio, config.n_embed)) | ||
modules = nn.Sequential(*modules) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to parallelize this part with Column and Row linear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yizhang2077 Actually with GELU we'll have to gather output for each TP linear. Should we use replicated linear instead?
two problem. one is radix cache will make input error, i will try to fix it. the second is output seems like not use images embedding. Can you help me to debug it? |
Let me try tomorrow |
logger.info( | ||
"Automatically turn off --chunked-prefill-size and disable radix cache for deekseek-vl2." | ||
) | ||
server_args.chunked_prefill_size = -1 | ||
server_args.disable_radix_cache = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The language part still supports radix cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The language part relay on input embed. If use radix cache, the input embed is wrong. I will try to debug it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I think you're right. Llava and qwen_vl also don't use radix attn
Has this been done? @ccw1996 @yizhang2077 basically, do not paste codes but rather import it when needed |
Sorry, i missed this comment. |
I initially thought it's unnecessary to add a dependency just for one model, cuz you'll also have to update CI😂but it's fine if you guys agree |
i think better import it. |
@zhaochenyang20 @yizhang2077 done. Does i need to update branch? |
@yizhang2077 i can not update this sheet |
You can paste result here, and then I update it |
sglang is 0.442 and hf is not support |
@yizhang2077 @ccw1996 what shall we do next? I have the access and I can merge after yi's approval. |
@zhaochenyang20 @yizhang2077 i resolve merge conflict about gemma3. now waiting for approval. |
sure! |
@yizhang2077 @mickqian could you help to give a quick overview? |
@ccw1996 fix lint plz |
LGTM |
@ccw1996 sure. I will run the CI for you and do not rebase any more. leave it to us |
@ccw1996 please add pip install timm in |
I've done it |
Motivation
Add Deepseek-VL2 model to SGLang, as requested in #2653
Modifications
Checklist