Skip to content

Conversation

mickqian
Copy link
Collaborator

@mickqian mickqian commented Feb 6, 2025

Motivation

Support InternVL2_5, as requested in #3092

Modifications

  1. InternVLChatModel

Checklist

  • Format your code according to the Code Formatting with Pre-Commit.
  • Add unit tests as outlined in the Running Unit Tests.
  • Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

@mickqian
Copy link
Collaborator Author

mickqian commented Feb 6, 2025

depends on #3203

@mickqian mickqian force-pushed the internVL branch 2 times, most recently from 6cf5f01 to 506dda1 Compare February 7, 2025 13:48
@mickqian mickqian marked this pull request as ready for review February 9, 2025 08:24
@mickqian mickqian changed the title feature: Intern vl 2.5 model: Intern vl 2.5 Feb 20, 2025

all_frames = []

def load_image_internvl(image_file, input_size=448, max_num=12):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to introduce parameters for max/min numbers passed from the API, similar to what is done in lmdeploy?

internvl.md

    dict(type='image_url', image_url=dict(max_dynamic_patch=12, url='https://raw.githubusercontent.com/OpenGVLab/InternVL/main/internvl_chat/examples/image1.jpg')),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that parameter frequently used? If not, and other vl models don't support that parameter, this might be of low-priority

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the user wants to control the limit on the size of image slices per request?

self.vision_model = InternVisionModel(
config=config.vision_config, quant_config=quant_config
)
self.language_model = InternLM2ForCausalLM(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add support to qwen/llama/.. language_model.

@Titan-p
Copy link
Contributor

Titan-p commented Feb 21, 2025

Edit
I believe I have identified the issue: it's related to the order of the image token and the user input.
Acording to InternVL2_5 example,

question = '<image>\nPlease describe the image shortly.'
response = model.chat(tokenizer, pixel_values, question, generation_config)
print(f'User: {question}\nAssistant: {response}')

the <image> token should be front of the user input.

diff --git a/python/sglang/srt/conversation.py b/python/sglang/srt/conversation.py
index 65eefca3..cae45883 100644
--- a/python/sglang/srt/conversation.py
+++ b/python/sglang/srt/conversation.py
@@ -440,7 +440,10 @@ def generate_chat_conv(
                         real_content += content.text
                     elif content.type == "image_url":
                         # NOTE: Only works for llava
-                        real_content += image_token
+                        if conv.name == "internvl2_5":
+                            real_content = image_token + real_content
+                        else:
+                            real_content += image_token
                         conv.append_image(content.image_url.url)
                 conv.append_message(conv.roles[0], real_content)
         elif msg_role == "assistant":

This modification will work.


It appears there are some discrepancies with the inference results from lmdeploy. SGLang is unable to generate Chinese output particularly in multi-image scenarios.
Request:

{
    "model": "/home/base-data/largeModel/internalAccess/InternVL2_5-2B",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "中文描述两张图片内容"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "/home/panlyu/images/004.jpg"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "/home/panlyu/images/005.jpg"
                    }
                }
            ]
        }
    ]
}

Result

  • SGLang

    • "message": { "role": "assistant", "content": "The image on the left is a meme featuring a character from an animated film holding a dog, while the image on the right is an image of a dog lying down. The text in the meme on the left is \"JUST..MONDAY.\" The image on the right is captioned with the text \"MONDAY.\"", "tool_calls": null },
  • lmdeploy 0.7.0.post3

    • "message": { "role": "assistant", "content": "这两张图片是一张恶搞图,将动画角色柴犬(Shiba Inu)与现实生活中的著名科技公司创始人之一埃隆·马斯克的形象结合在一起。\n\n第一张图片上,柴犬被描绘成举着一只小柴犬,仿佛在庆祝某个事件。背景是晴朗的天空,给人一种轻松愉快的感觉。\n\n第二张图片上,一只法国斗牛犬懒散地躺在地板上,似乎在打盹,背景也是一片蓝色的地板。图片上方和下方分别有文字:“MONDAY. JUST..MONDAY.”(星期一,只是...星期一。)\n\n这两张图片通过将柴犬与马斯克的形象结合,形成了一种幽默的对比。柴犬通常形象活泼可爱,而法国斗牛犬则通常被描绘成慵懒、放荡不羁。马斯克的形象则以严肃和认真著称,这使得这张恶搞图显得非常滑稽,也传达了一种对生活的调侃。", "tool_calls": null },
  • Images I used:
    004
    005

@mickqian mickqian force-pushed the internVL branch 2 times, most recently from f4de116 to a7734d7 Compare February 21, 2025 05:12
@zhaochenyang20
Copy link
Collaborator

This LGTM, but too large. I think decouple should be good.

@mickqian mickqian marked this pull request as draft March 9, 2025 09:51
@Fxycst1213 Fxycst1213 mentioned this pull request Mar 13, 2025
6 tasks
@hehesangsj hehesangsj mentioned this pull request Mar 14, 2025
6 tasks
@mickqian mickqian closed this Mar 21, 2025
@zhaochenyang20
Copy link
Collaborator

@mickqian Why we close this? Are there any follow up?

@xiaomin-D xiaomin-D mentioned this pull request Apr 13, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants