Workaround for transformers 4.49 #3792

panpan0000 · 2025-02-23T03:58:58Z

Motivation

we still encounter issue when loading Qwen-2.5 models with error , when with lastest transformers=4.49:

python python/sglang/launch_server.py --model  Qwen2.5-0.5B-Instruct

    AutoImageProcessor.register(Qwen2_5_VLConfig, None, Qwen2_5_VLImageProcessor, None)
     ...
    File "/..../python3.12/site-packages/transformers/models/auto/auto_factory.py", line 833, in register
    raise ValueError(f"'{key}' is already used by a Transformers model.")
ValueError: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.

I guess it's a problem with Transformers and Qwen-2.5 config compatibility ?
So far to address the issue:

either downgrade to transformers 4.49 use transformers 4.48.3 #3650
or workaround with this PR

This PR may NOT be suitable for merging. but to show possible workaround solution to people stucking with this problem.

Modifications

just add try-catch
And with my test below , all functions seems are going well .

Test:

# pip3 list |grep transformers
transformers                      4.49.0

# python python/sglang/launch_server.py --model Qwen2.5-0.5B-Instruct/ .....
Warning: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.
INFO 02-23 11:56:07 __init__.py:190] Automatically detected platform cuda.
[2025-02-23 11:56:09] server_args=ServerArgs(model_path=......, enable_flashinfer_mla=False)
Warning: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.
INFO 02-23 11:56:13 __init__.py:190] Automatically detected platform cuda.
Warning: '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.
INFO 02-23 11:56:13 __init__.py:190] Automatically detected platform cuda.
[2025-02-23 11:56:15 TP0] Init torch distributed begin.
[2025-02-23 11:56:16 TP0] Load weight begin. avail mem=23.33 GB
[2025-02-23 11:56:16 TP0] Ignore import error when loading sglang.srt.models.qwen2_5_vl. '<class 'sglang.srt.configs.qwen2_5_vl_config.Qwen2_5_VLConfig'>' is already used by a Transformers model.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.55it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.55it/s]

[2025-02-23 11:56:16 TP0] Load weight end. type=Qwen2ForCausalLM, dtype=torch.bfloat16, avail mem=22.31 GB
[2025-02-23 11:56:16 TP0] KV Cache is allocated. K size: 9.76 GB, V size: 9.76 GB.
[2025-02-23 11:56:16 TP0] Memory pool end. avail mem=2.22 GB
[2025-02-23 11:56:16 TP0] Capture cuda graph begin. This can take up to several minutes.
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.26it/s]
[2025-02-23 11:56:18 TP0] Capture cuda graph end. Time elapsed: 1.77 s
[2025-02-23 11:56:19 TP0] max_total_num_tokens=1705060, chunked_prefill_size=2048, max_prefill_tokens=16384, max_running_requests=4097, context_len=32768
[2025-02-23 11:56:19] INFO:     Started server process [3101123]
[2025-02-23 11:56:19] INFO:     Waiting for application startup.
[2025-02-23 11:56:19] INFO:     Application startup complete.
[2025-02-23 11:56:19] INFO:     Uvicorn running on http://0.0.0.0:17777 (Press CTRL+C to quit)
[2025-02-23 11:56:20] INFO:     127.0.0.1:44720 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-02-23 11:56:20] Receive: obj=GenerateReqInput(text='The capital city of France is', input_ids=None, input_embeds=None, image_data=None, sampling_params={'temperature': 0, 'max_new_tokens': 8}, rid='04c6b7a76bfc467e973c5f0fbc63c9df', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=False, stream=False, log_metrics=True, modalities=None, lora_path=None, session_params=None, custom_logit_processor=None)
[2025-02-23 11:56:20 TP0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-02-23 11:56:20] Finish: obj=GenerateReqInput(text='The capital city of France is', input_ids=None, input_embeds=None, image_data=None, sampling_params={'temperature': 0, 'max_new_tokens': 8}, rid='04c6b7a76bfc467e973c5f0fbc63c9df', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=False, stream=False, log_metrics=True, modalities=None, lora_path=None, session_params=None, custom_logit_processor=None), out={'text': ' Paris. It is the largest city in', 'meta_info': {'id': '04c6b7a76bfc467e973c5f0fbc63c9df', 'finish_reason': {'type': 'length', 'length': 8}, 'prompt_tokens': 6, 'completion_tokens': 8, 'cached_tokens': 0}}
[2025-02-23 11:56:20] INFO:     127.0.0.1:44728 - "POST /generate HTTP/1.1" 200 OK
[2025-02-23 11:56:20] The server is fired up and ready to roll!
[2025-02-23 11:56:21] Receive: obj=GenerateReqInput(text='consider the rhythmical for number sequence 1,3,6,18,21 ', input_ids=[151644, 8948, 198, 2610, 525, 1207, 16948, 11, 3465, 553, 54364, 14817, 13, 1446, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 24712, 279, 36290, 938, 369, 1372, 8500, 220, 16, 11, 18, 11, 21, 11, 16, 23, 11, 17, 16, 220, 151645, 198, 151644, 77091, 198], input_embeds=None, image_data=None, sampling_params={'temperature': 0.7, 'max_new_tokens': 2500, 'min_new_tokens': 0, 'stop': None, 'stop_token_ids': None, 'top_p': 1.0, 'top_k': -1, 'min_p': 0.0, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'regex': None, 'ebnf': None, 'n': 1, 'no_stop_trim': False, 'ignore_eos': False, 'skip_special_tokens': True}, rid='349f8a2bcff4428daee282e221c6be1a', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=True, stream=False, log_metrics=True, modalities=[], lora_path=None, session_params=None, custom_logit_processor=None)
[2025-02-23 11:56:21 TP0] Prefill batch. #new-seq: 1, #new-token: 49, #cached-token: 0, cache hit rate: 0.00%, token usage: 0.00, #running-req: 0, #queue-req: 0
[2025-02-23 11:56:21 TP0] Decode batch. #running-req: 1, #token: 82, token usage: 0.00, gen throughput (token/s): 17.55, #queue-req: 0
[2025-02-23 11:56:21 TP0] Decode batch. #running-req: 1, #token: 122, token usage: 0.00, gen throughput (token/s): 430.35, #queue-req: 0
[2025-02-23 11:56:21 TP0] Decode batch. #running-req: 1, #token: 162, token usage: 0.00, gen throughput (token/s): 441.01, #queue-req: 0
[2025-02-23 11:56:21] Finish: obj=GenerateReqInput(text='consider the rhythmical for number sequence 1,3,6,18,21 ', input_ids=[151644, 8948, 198, 2610, 525, 1207, 16948, 11, 3465, 553, 54364, 14817, 13, 1446, 525, 264, 10950, 17847, 13, 151645, 198, 151644, 872, 198, 24712, 279, 36290, 938, 369, 1372, 8500, 220, 16, 11, 18, 11, 21, 11, 16, 23, 11, 17, 16, 220, 151645, 198, 151644, 77091, 198], input_embeds=None, image_data=None, sampling_params={'temperature': 0.7, 'max_new_tokens': 2500, 'min_new_tokens': 0, 'stop': None, 'stop_token_ids': None, 'top_p': 1.0, 'top_k': -1, 'min_p': 0.0, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'repetition_penalty': 1.0, 'regex': None, 'ebnf': None, 'n': 1, 'no_stop_trim': False, 'ignore_eos': False, 'skip_special_tokens': True}, rid='349f8a2bcff4428daee282e221c6be1a', return_logprob=False, logprob_start_len=-1, top_logprobs_num=0, return_text_in_logprobs=True, stream=False, log_metrics=True, modalities=[], lora_path=None, session_params=None, custom_logit_processor=None), out={'text': "I'm sorry, but I don't have enough context to provide a rhythmical analysis for the number sequence you've presented. It appears to be a sequence of numbers, but I can't determine if there are any specific patterns or formulas involved without more information. \n\nIf you have any additional details about the sequence or a specific question about it, I'd be happy to attempt a general analysis or provide guidance on how to interpret such a sequence. Alternatively, if you can share the sequence with me, I might be able to suggest a particular mathematical approach or provide more context.", 'meta_info': {'id': '349f8a2bcff4428daee282e221c6be1a', 'finish_reason': {'type': 'stop', 'matched': 151645}, 'prompt_tokens': 49, 'completion_tokens': 116, 'cached_tokens': 0}}
[2025-02-23 11:56:21] INFO:     127.0.0.1:44742 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>

zhyncs · 2025-03-22T22:27:14Z

The latest main update is v4.50 for now.

Workaround for transformers 4.49

4cb9c51

Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>

panpan0000 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners February 23, 2025 03:58

panpan0000 mentioned this pull request Feb 23, 2025

fix: remove dependency on latest transformers impl #3635

Merged

6 tasks

zhyncs closed this Mar 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Workaround for transformers 4.49 #3792

Workaround for transformers 4.49 #3792

Uh oh!

panpan0000 commented Feb 23, 2025 •

edited

Loading

Uh oh!

zhyncs commented Mar 22, 2025

Uh oh!

Uh oh!

Workaround for transformers 4.49 #3792

Workaround for transformers 4.49 #3792

Uh oh!

Conversation

panpan0000 commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Test:

Checklist

Uh oh!

zhyncs commented Mar 22, 2025

Uh oh!

Uh oh!

panpan0000 commented Feb 23, 2025 •

edited

Loading