Skip to content

Conversation

panpan0000
Copy link
Contributor

@panpan0000 panpan0000 commented Feb 23, 2025

Motivation

as comments in #3635 (comment)

we still encounter issue when loading Qwen-2.5 models with error , when with lastest transformers=4.49:

python python/sglang/launch_server.py --model  Qwen2.5-0.5B-Instruct
    AutoImageProcessor.register(Qwen2_5_VLConfig, None, Qwen2_5_VLImageProcessor, None)
     ...
    File "/..../python3.12/site-packages/transformers/models/auto/auto_factory.py", line 833, in register
    raise ValueError(f"'{key}' is already used by a Transformers model.")
ValueError: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.

I guess it's a problem with Transformers and Qwen-2.5 config compatibility ?
So far to address the issue:

  1. either downgrade to transformers 4.49 use transformers 4.48.3 #3650
  2. or workaround with this PR

This PR may NOT be suitable for merging. but to show possible workaround solution to people stucking with this problem.

Modifications

just add try-catch
And with my test below , all functions seems are going well .

Test:

# pip3 list |grep transformers
transformers                      4.49.0
# python python/sglang/launch_server.py --model Qwen2.5-0.5B-Instruct/ .....
Warning: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.
INFO 02-23 11:56:07 __init__.py:190] Automatically detected platform cuda.
[2025-02-23 11:56:09] server_args=ServerArgs(model_path=......, enable_flashinfer_mla=False)
Warning: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.
INFO 02-23 11:56:13 __init__.py:190] Automatically detected platform cuda.
Warning: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.
INFO 02-23 11:56:13 __init__.py:190] Automatically detected platform cuda.
[2025-02-23 11:56:15 TP0] Init torch distributed begin.
[2025-02-23 11:56:16 TP0] Load weight begin. avail mem=23.33 GB
[2025-02-23 11:56:16 TP0] Ignore import error when loading sglang.srt.models.qwen2_5_vl. '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.55it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.55it/s]

[2025-02-23 11:56:16 TP0] Load weight end. type=Qwen2ForCausalLM, dtype=torch.bfloat16, avail mem=22.31 GB
[2025-02-23 11:56:16 TP0] KV Cache is allocated. K size: 9.76 GB, V size: 9.76 GB.
[2025-02-23 11:56:16 TP0] Memory pool end. avail mem=2.22 GB
[2025-02-23 11:56:16 TP0] Capture cuda graph begin. This can take up to several minutes.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.26it/s]
[2025-02-23 11:56:18 TP0] Capture cuda graph end. Time elapsed: 1.77 s
[2025-02-23 11:56:19 TP0] max_total_num_tokens=1705060, chunked_prefill_size=2048, max_prefill_tokens=16384, max_running_requests=4097, context_len=32768
[2025-02-23 11:56:19] INFO:     Started server process [3101123]
[2025-02-23 11:56:19] INFO:     Waiting for application startup.
[2025-02-23 11:56:19] INFO:     Application startup complete.
[2025-02-23 11:56:19] INFO:     Uvicorn running on http://0.0.0.0:17777 (Press CTRL+C to quit)
[2025-02-23 11:56:20] INFO:     127.0.0.1:44720 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-02-23 11:56:20] Receive: obj=GenerateReqInput(text='The capital city of France is', input_ids=None, input_embeds=None, image_data=None, sampling_params={'temperature': 0, 'max_new_tokens': 8}, rid='04c6b7a76bfc467e973c5f0fbc63c9df', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=False, stream=False, log_metrics=True, modalities=None, lora_path=None, session_params=None, custom_logit_processor=None)
[2025-02-23 11:56:20 TP0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-02-23 11:56:20] Finish: obj=GenerateReqInput(text='The capital city of France is', input_ids=None, input_embeds=None, image_data=None, sampling_params={'temperature': 0, 'max_new_tokens': 8}, rid='04c6b7a76bfc467e973c5f0fbc63c9df', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=False, stream=False, log_metrics=True, modalities=None, lora_path=None, session_params=None, custom_logit_processor=None), out={'text': ' Paris. It is the largest city in', 'meta_info': {'id': '04c6b7a76bfc467e973c5f0fbc63c9df', 'finish_reason': {'type': 'length', 'length': 8}, 'prompt_tokens': 6, 'completion_tokens': 8, 'cached_tokens': 0}}
[2025-02-23 11:56:20] INFO:     127.0.0.1:44728 - "POST /generate HTTP/1.1" 200 OK
[2025-02-23 11:56:20] The server is fired up and ready to roll!
[2025-02-23 11:56:21] Receive: obj=GenerateReqInput(text='consider the rhythmical for number sequence 1,3,6,18,21 ', input_ids=[151644, 8948, 198, 2610, 525, 1207, 16948, 11, 3465, 553, 54364, 14817, 13, 1446, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 24712, 279, 36290, 938, 369, 1372, 8500, 220, 16, 11, 18, 11, 21, 11, 16, 23, 11, 17, 16, 220, 151645, 198, 151644, 77091, 198], input_embeds=None, image_data=None, sampling_params={'temperature': 0.7, 'max_new_tokens': 2500, 'min_new_tokens': 0, 'stop': None, 'stop_token_ids': None, 'top_p': 1.0, 'top_k': -1, 'min_p': 0.0, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'regex': None, 'ebnf': None, 'n': 1, 'no_stop_trim': False, 'ignore_eos': False, 'skip_special_tokens': True}, rid='349f8a2bcff4428daee282e221c6be1a', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=True, stream=False, log_metrics=True, modalities=[], lora_path=None, session_params=None, custom_logit_processor=None)
[2025-02-23 11:56:21 TP0] Prefill batch. #new-seq: 1, #new-token: 49, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-02-23 11:56:21 TP0] Decode batch. #running-req: 1, #token: 82, token usage: 0.00, gen throughput (token/s): 17.55, #queue-req: 0
[2025-02-23 11:56:21 TP0] Decode batch. #running-req: 1, #token: 122, token usage: 0.00, gen throughput (token/s): 430.35, #queue-req: 0
[2025-02-23 11:56:21 TP0] Decode batch. #running-req: 1, #token: 162, token usage: 0.00, gen throughput (token/s): 441.01, #queue-req: 0
[2025-02-23 11:56:21] Finish: obj=GenerateReqInput(text='consider the rhythmical for number sequence 1,3,6,18,21 ', input_ids=[151644, 8948, 198, 2610, 525, 1207, 16948, 11, 3465, 553, 54364, 14817, 13, 1446, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 24712, 279, 36290, 938, 369, 1372, 8500, 220, 16, 11, 18, 11, 21, 11, 16, 23, 11, 17, 16, 220, 151645, 198, 151644, 77091, 198], input_embeds=None, image_data=None, sampling_params={'temperature': 0.7, 'max_new_tokens': 2500, 'min_new_tokens': 0, 'stop': None, 'stop_token_ids': None, 'top_p': 1.0, 'top_k': -1, 'min_p': 0.0, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'regex': None, 'ebnf': None, 'n': 1, 'no_stop_trim': False, 'ignore_eos': False, 'skip_special_tokens': True}, rid='349f8a2bcff4428daee282e221c6be1a', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=True, stream=False, log_metrics=True, modalities=[], lora_path=None, session_params=None, custom_logit_processor=None), out={'text': "I'm sorry, but I don't have enough context to provide a rhythmical analysis for the number sequence you've presented. It appears to be a sequence of numbers, but I can't determine if there are any specific patterns or formulas involved without more information. \n\nIf you have any additional details about the sequence or a specific question about it, I'd be happy to attempt a general analysis or provide guidance on how to interpret such a sequence. Alternatively, if you can share the sequence with me, I might be able to suggest a particular mathematical approach or provide more context.", 'meta_info': {'id': '349f8a2bcff4428daee282e221c6be1a', 'finish_reason': {'type': 'stop', 'matched': 151645}, 'prompt_tokens': 49, 'completion_tokens': 116, 'cached_tokens': 0}}
[2025-02-23 11:56:21] INFO:     127.0.0.1:44742 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Checklist

Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
@zhyncs
Copy link
Member

zhyncs commented Mar 22, 2025

The latest main update is v4.50 for now.

@zhyncs zhyncs closed this Mar 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants