[model] Add Qwen2.5-Omni #7537

Kuangdd01 · 2025-03-30T07:23:51Z

What does this PR do?

This PR aims to support the fine-tuning/post-training of the Thinker part of the latest Qwen2.5-Omni model.

Env Info

# prepare transformers with the following cmd
pip install -U transformers

TODO List

These tests were done with lora and demo data setups with freezing audio tower & vision tower.

Test recipes

### model
model_name_or_path: Qwen/Qwen2.5-Omni-7B
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

### dataset
dataset: identity, mllm_audio_demo, mllm_demo 
template: qwen2_omni
cutoff_len: 3072
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 1
dataloader_num_workers: 1

### output
output_dir: saves/qwen2_omni-7b/lora/sft
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
freeze_vision_tower: true
learning_rate: 1.0e-4
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

Checkpoint Merging

Currently, we have updated the scripts for merging lora weights of thinker model or merging full weights of thinker model.
Usage below:

# for lora
python3 ./scripts/qwen_omni_merge.py merge_lora \
  --base_model_path="./Qwen2_5Omni" \
  --lora_checkpoint_path="./lora_checkpoint" \
  --save_path="./target_dir"

# for full ft
python3 ./scripts/qwen_omni_merge.py save_full \
  --base_model_path="./Qwen2_5Omni" \
  --saved_thinker_path="./saved_thinker" \
  --save_path="./target_dir"

Then try the official inference pipeline:

import soundfile as sf
from io import BytesIO
from urllib.request import urlopen
from qwen_vl_utils import process_vision_info
from transformers import Qwen2_5OmniProcessor, Qwen2_5OmniModel

model_path = "./merged_model_checkpoint"

model = Qwen2_5OmniModel.from_pretrained(model_path, torch_dtype="auto", device_map="auto") 
processor = Qwen2_5OmniProcessor.from_pretrained(model_path)
from qwen_omni_utils import process_mm_info

conversation1 = [
        {'role': 'system', 'content': 'You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.'},
        {"role": "user", "content": [
            {"type": "text", "text": "Who are you?"},
        ]},
]
conversations = [conversation1]

text = processor.apply_chat_template(conversations, add_generation_prompt=True, tokenize=False)
audios, images, videos = process_mm_info(conversations, use_audio_in_video=False)
inputs = processor(text=text, audios=audios, images=images, videos=videos, return_tensors="pt", padding=True, use_audio_in_video=False)
inputs = inputs.to(model.device).to(model.dtype)
text_ids, audio = model.generate(**inputs, use_audio_in_video=False)
text = processor.batch_decode(text_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(text)
sf.write(
    "output.wav",
    audio.reshape(-1).detach().cpu().numpy(),
    samplerate=24000,
)

Fixes #7504 (issue)

For Finetuning with data `video + audio`

Refer to #7638 for data preparation and turn use_audio_in_video to true

For vllm serve

Warning

After loading the merged model using vllm, you need to manually change the architectures field in the model's config file from Qwen2_5OmniForConditionalGeneration to Qwen2_5OmniModel. After this change, vllm can load it automatically.

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

src/llamafactory/data/collator.py

Harryjun · 2025-03-31T13:20:09Z

有没有一些数据例子能快跑一些？提供个几条测试数据？（各种模态的）

hiyouga · 2025-03-31T13:25:31Z

@Harryjun 上面提供了。

zzhdbw · 2025-03-31T15:10:37Z

What does this PR do?

This PR aims to support the fine-tuning/post-training of the Thinker part of the latest Qwen2_5Omni model.

Env Info

# build on this commit
pip install git+https://github.com/huggingface/transformers.git 4892b6d61f3e2ea949c581be1e94a1f5292959c3

TODO List

These tests were only done with lora and demo data setups with freezing audio tower & vision tower.

test configs

### model
model_name_or_path: ./Qwen2.5-Omni-7B
image_max_pixels: 262144
video_max_pixels: 16384
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all

### dataset
dataset: identity, mllm_audio_demo, mllm_demo 
template: qwen2_omni
cutoff_len: 3072
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 1
dataloader_num_workers: 1

### output
output_dir: saves/qwen2_omni-7b/lora/sft
logging_steps: 1
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 4
freeze_vision_tower: true
learning_rate: 1.0e-4
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

For Lora merge, try this script.

python scripts/lora_merge_part.py --base_model_path ./Qwen2.5-Omni-7B --lora_checkpoint_path ./saves/sft/lora/

Then try the official inference pipeline:

import soundfile as sf
from io import BytesIO
from urllib.request import urlopen
from qwen_vl_utils import process_vision_info
from transformers import Qwen2_5OmniProcessor, Qwen2_5OmniModel

model_path = "./merged_model_checkpoint"

model = Qwen2_5OmniModel.from_pretrained(model_path, torch_dtype="auto", device_map="auto") 
processor = Qwen2_5OmniProcessor.from_pretrained(model_path)
from qwen_omni_utils import process_mm_info

conversation1 = [
        {'role': 'system', 'content': 'You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.'},
        {"role": "user", "content": [
            {"type": "text", "text": "Who are you?"},
        ]},
]
conversations = [conversation1]

text = processor.apply_chat_template(conversations, add_generation_prompt=True, tokenize=False)
audios, images, videos = process_mm_info(conversations, use_audio_in_video=False)
inputs = processor(text=text, audios=audios, images=images, videos=videos, return_tensors="pt", padding=True, use_audio_in_video=False)
inputs = inputs.to(model.device).to(model.dtype)
text_ids, audio = model.generate(**inputs, use_audio_in_video=False)
text = processor.batch_decode(text_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
print(text)
sf.write(
    "output.wav",
    audio.reshape(-1).detach().cpu().numpy(),
    samplerate=24000,
)

Fixes #7504 (issue)

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

音频微调有BUG：已经提了issue：#7552

xiexiaoshinick · 2025-04-01T12:35:02Z

请问模型支持全参数微调么？

Kuangdd01 · 2025-04-01T18:00:49Z

请问模型支持全参数微调么？

支持

xiexiaoshinick · 2025-04-02T03:18:44Z

请问模型支持全参数微调么？

支持

我试了一下全参数训练，用单张A100去训练，会报显存不足的错误，用8张A100使用deepspeed去训练，会报
"""
[rank3]: re.compile("|".join([re.escape(plan) for plan in model._tp_plan]))
[rank3]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]: TypeError: 'NoneType' object is not iterable
"""
错误，请问你的全参数训练yaml配置能共享一下么？

MonsterPPPP · 2025-06-06T08:42:27Z

i have successfully ran the lora and get the ckpt, but at merging i got:

Traceback (most recent call last):
  File "/home/pring/jiulin/cache/LLaMA-Factory/./scripts/qwen_omni_merge.py", line 32, in <module>
    from transformers import (
ImportError: cannot import name 'Qwen2_5OmniForConditionalGeneration' from 'transformers'

I have already did this:

# prepare transformers with the following cmd
pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni

Kuangdd01 · 2025-06-06T08:48:33Z

pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni

pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni
This command is out of date. You can use transformers==4.52.4.

JOY-SWang · 2025-06-06T08:55:14Z

Looking in indexes: https://mirrors.ivolces.com/pypi/simple/
[2025-06-06 16:53:49] Collecting git+https://github.com/Kuangdd01/transformers.git@qwen25omni
[2025-06-06 16:53:49] Cloning https://github.com/Kuangdd01/transformers.git (to revision qwen25omni) to /tmp/pip-req-build-3_evxe8u
[2025-06-06 16:53:49] Running command git clone --filter=blob:none --quiet https://github.com/Kuangdd01/transformers.git /tmp/pip-req-build-3_evxe8u

一直卡在这里没有反应

Kuangdd01 · 2025-06-06T09:13:21Z

Looking in indexes: https://mirrors.ivolces.com/pypi/simple/ [2025-06-06 16:53:49] Collecting git+https://github.com/Kuangdd01/transformers.git@qwen25omni [2025-06-06 16:53:49] Cloning https://github.com/Kuangdd01/transformers.git (to revision qwen25omni) to /tmp/pip-req-build-3_evxe8u [2025-06-06 16:53:49] Running command git clone --filter=blob:none --quiet https://github.com/Kuangdd01/transformers.git /tmp/pip-req-build-3_evxe8u

一直卡在这里没有反应

pip install -U transformers即可

JOY-SWang · 2025-06-06T09:37:37Z

Looking in indexes: https://mirrors.ivolces.com/pypi/simple/ [2025-06-06 16:53:49] Collecting git+https://github.com/Kuangdd01/transformers.git@qwen25omni [2025-06-06 16:53:49] Cloning https://github.com/Kuangdd01/transformers.git (to revision qwen25omni) to /tmp/pip-req-build-3_evxe8u [2025-06-06 16:53:49] Running command git clone --filter=blob:none --quiet https://github.com/Kuangdd01/transformers.git /tmp/pip-req-build-3_evxe8u
一直卡在这里没有反应

pip install -U transformers即可

但是这样下载的是transformers==4.52.4, transformers中没有 Qwen2_5OmniModel，只有Qwen2_5OmniForConditionalGeneration；
在huggingface/transformers中的Qwen2_5OmniProcessor，processor.apply_chat_template(
conversations,）传入的是conversations 没有audios的key。

Kuangdd01 · 2025-06-06T10:03:44Z

Looking in indexes: https://mirrors.ivolces.com/pypi/simple/ [2025-06-06 16:53:49] Collecting git+https://github.com/Kuangdd01/transformers.git@qwen25omni [2025-06-06 16:53:49] Cloning https://github.com/Kuangdd01/transformers.git (to revision qwen25omni) to /tmp/pip-req-build-3_evxe8u [2025-06-06 16:53:49] Running command git clone --filter=blob:none --quiet https://github.com/Kuangdd01/transformers.git /tmp/pip-req-build-3_evxe8u
一直卡在这里没有反应

pip install -U transformers即可

但是这样下载的是transformers==4.52.4, transformers中没有 Qwen2_5OmniModel，只有Qwen2_5OmniForConditionalGeneration；在huggingface/transformers中的Qwen2_5OmniProcessor，processor.apply_chat_template( conversations,）传入的是conversations 没有audios的key。

有audio这个key 改了接口名称
https://github.com/huggingface/transformers/blob/02f946a0386c67540538030da2ff87bbac5eca24/src/transformers/models/qwen2_5_omni/processing_qwen2_5_omni.py#L113-L120

* preserve image_sizes * preserve image_sizes * init plugin * support audio-text2text lora * nit * support image/video-text2text, audio-text2text * remove args * remove lines * add docs && nit * remove some comments * fix && add merge part script * add license

MonsterPPPP · 2025-06-09T01:54:29Z

我已经更新了最新的transformers==4.52.4，成功merge了权重，然而并没有找到你上面例子中给出的Qwen2_5OmniModel
ImportError: cannot import name 'Qwen2_5OmniModel' from 'transformers'
请问是需要改成：Qwen2_5OmniForConditionalGeneration吗

Kuangdd01 · 2025-06-09T02:21:33Z

我已经更新了最新的transformers==4.52.4，成功merge了权重，然而并没有找到你上面例子中给出的Qwen2_5OmniModel ImportError: cannot import name 'Qwen2_5OmniModel' from 'transformers' 请问是需要改成：Qwen2_5OmniForConditionalGeneration吗

是的

wwfnb · 2025-06-12T07:12:08Z

pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni

pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni This command is out of date. You can use transformers==4.52.4.

您好，就是我现在有一个问题，我前天安装了训练环境训练qwen-omni，但是我仍然使用的是pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni这个transformers版本进行的训练，因为训练时间比较长，我今天合并之后，发现没有办法使用官方的transformers和vllm进行推理。请问我可以直接升级transformers版本，然后进行合并来支持使用官方的transformers和vllm进行推理吗

Kuangdd01 · 2025-06-12T08:32:52Z

pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni

pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni This command is out of date. You can use transformers==4.52.4.

您好，就是我现在有一个问题，我前天安装了训练环境训练qwen-omni，但是我仍然使用的是pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni这个transformers版本进行的训练，因为训练时间比较长，我今天合并之后，发现没有办法使用官方的transformers和vllm进行推理。请问我可以直接升级transformers版本，然后进行合并来支持使用官方的transformers和vllm进行推理吗

具体是什么报错呢，vllm的报错吗

wwfnb · 2025-06-12T08:51:46Z

pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni

pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni This command is out of date. You can use transformers==4.52.4.

您好，就是我现在有一个问题，我前天安装了训练环境训练qwen-omni，但是我仍然使用的是pip install git+https://github.com/Kuangdd01/transformers.git@qwen25omni这个transformers版本进行的训练，因为训练时间比较长，我今天合并之后，发现没有办法使用官方的transformers和vllm进行推理。请问我可以直接升级transformers版本，然后进行合并来支持使用官方的transformers和vllm进行推理吗

具体是什么报错呢，vllm的报错吗

抱歉打扰您。我的具体情况是这样的，我在6月11号下载了LLaMafactory的代码，但是使用的transformers的版本是用的git+https://github.com/Kuangdd01/transformers.git@qwen25omni的，因为我们没有关注最新的issue。在Omni上进行了全量的sft训练之后，对模型进行了merge。然后我们准备用vllm对该模型进行推理， vllm版本我们没有使用最新的版本，使用的0.8.5.post1, 使用的transformers版本是https://github.com/huggingface/transformers@v4.51.3-Qwen2.5-Omni-preview（使用这两个版本是因为我们用这些版本可以走完Omni的训练和推理流程），我们不清楚是否因为这些代码在这段时间发生了更改。现在的情况是，在推理时，vllm可以正常加载原始Qwen-Omni模型，但是用vllm加载merge之后的模型（ a=LLM(model="/home/export/base/ycsc_lijt1/lijt1/online1/lxb/LLaMA-Factory/saves/0611_Omni_text/10_criteria_66k/final")）会报错 ”ValueError: Qwen2_5OmniForConditionalGeneration has no vLLM implementation and the Transformers implementation is not compatible with vLLM. Try setting VLLM_USE_V1=0.“ 。我们目前很疑惑，为什么可以正常加载Omni但是加载sft并merge之后的Omni就会报这个错误。

wwfnb · 2025-06-12T12:08:54Z

4.52.4

我们解决这个问题了，在使用vllm加载合并之后的模型时，需要将合并之后的模型中config的architectures字段从”Qwen2_5OmniForConditionalGeneration“手动改成”Qwen2_5OmniModel“，这样vllm就可以直接加载了。

Kuangdd01 · 2025-06-12T12:24:25Z

4.52.4

我们解决这个问题了，在使用vllm加载合并之后的模型时，需要将合并之后的模型中config的architectures字段从”Qwen2_5OmniForConditionalGeneration“手动改成”Qwen2_5OmniModel“，这样vllm就可以直接加载了。

好的我记录一下

zhangyuygss · 2025-06-18T06:18:44Z

4.52.4

我们解决这个问题了，在使用vllm加载合并之后的模型时，需要将合并之后的模型中config的architectures字段从”Qwen2_5OmniForConditionalGeneration“手动改成”Qwen2_5OmniModel“，这样vllm就可以直接加载了。

这个architecture的变化是SFT的过程中产生的吗？能否在merge脚本中修改arch，避免手动步骤？

Kuangdd01 · 2025-06-18T08:07:58Z

4.52.4

我们解决这个问题了，在使用vllm加载合并之后的模型时，需要将合并之后的模型中config的architectures字段从”Qwen2_5OmniForConditionalGeneration“手动改成”Qwen2_5OmniModel“，这样vllm就可以直接加载了。

这个architecture的变化是SFT的过程中产生的吗？能否在merge脚本中修改arch，避免手动步骤？

最新版的transfomrers代码中移除了Qwen2_5OmniModel这个类，更合适的方法应该是更新vllm 现在vllm支持识别Qwen2_5OmniForConditionalGeneration, vllm<=0.8.5没有这行注册信息
https://github.com/vllm-project/vllm/blob/19a53b27833d767632c9eaff6ddf5ef08ba3af2f/vllm/model_executor/models/registry.py#L215

crystalww · 2025-07-02T09:52:28Z

Looking in indexes: https://mirrors.ivolces.com/pypi/simple/ [2025-06-06 16:53:49] Collecting git+https://github.com/Kuangdd01/transformers.git@qwen25omni [2025-06-06 16:53:49] Cloning https://github.com/Kuangdd01/transformers.git (to revision qwen25omni) to /tmp/pip-req-build-3_evxe8u [2025-06-06 16:53:49] Running command git clone --filter=blob:none --quiet https://github.com/Kuangdd01/transformers.git /tmp/pip-req-build-3_evxe8u
一直卡在这里没有反应

pip install -U transformers即可

但是这样下载的是transformers==4.52.4, transformers中没有 Qwen2_5OmniModel，只有Qwen2_5OmniForConditionalGeneration；在huggingface/transformers中的Qwen2_5OmniProcessor，processor.apply_chat_template( conversations,）传入的是conversations 没有audios的key。

有audio这个key 改了接口名称 https://github.com/huggingface/transformers/blob/02f946a0386c67540538030da2ff87bbac5eca24/src/transformers/models/qwen2_5_omni/processing_qwen2_5_omni.py#L113-L120

使用最新的代码，sft还是会报错：LLaMA-Factory/src/llamafactory/data/mm_plugin.py", line 179, in _validate_input
raise ValueError("Processor was not found, please check and update your model file.")

llamafactory 0.9.4.dev0 c5a0829
transformers 4.52.4

hiyouga · 2025-07-02T09:59:18Z

@crystalww 若遇到类似 ValueError("Processor was not found, please check and update your model file.") 的报错，请运行下面的代码并且粘贴完整报错信息：

from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-Omni-7B")

xiaoyi1734 · 2025-07-03T03:36:26Z

Qwen2.5-Omni 微调支持 audio+text 2 audio+text吗？如果支持，数据格式是怎么样的

Kuangdd01 · 2025-07-03T03:40:55Z

Qwen2.5-Omni 微调支持 audio+text 2 audio+text吗？如果支持，数据格式是怎么样的

暂不支持audio decoder部分的微调

yuanhx24 · 2025-07-10T07:58:52Z

Qwen2.5-Omni 微调支持 audio+text 2 audio+text吗？如果支持，数据格式是怎么样的

暂不支持audio decoder部分的微调

您好，请问Qwen2.5-Omni微调只要输出是音频就不支持是吗，但是这个模型不是语音端到端，微调输出文本的不就失去部分端到端的优势了吗？因为需要在输出文本后再加个TTS转成音频。

Kuangdd01 · 2025-07-10T08:07:34Z

您好，请问Qwen2.5-Omni微调只要输出是音频就不支持是吗，但是这个模型不是语音端到端，微调输出文本的不就失去部分端到端的优势了吗？因为需要在输出文本后再加个TTS转成音频。

没办法构造raw audio到codec_ids的映射关系，那么就不能端到端进行训练了，具体可以看talker generation的代码，实际上音频的输出也是要等thinker部分的generated text再生成的，我们暂时只提供一个训练模型在下游任务微调thinker的能力，对于talker部分涉及到很多loss，包括codec ids的ce loss, dit部分的loss以及wave 部分的mse loss很难进行构建

yuanhx24 · 2025-07-10T08:41:06Z

您好，请问Qwen2.5-Omni微调只要输出是音频就不支持是吗，但是这个模型不是语音端到端，微调输出文本的不就失去部分端到端的优势了吗？因为需要在输出文本后再加个TTS转成音频。

没办法构造raw audio到codec_ids的映射关系，那么就不能端到端进行训练了，具体可以看talker generation的代码，实际上音频的输出也是要等thinker部分的generated text再生成的，我们暂时只提供一个训练模型在下游任务微调thinker的能力，对于talker部分涉及到很多loss，包括codec ids的ce loss, dit部分的loss以及wave 部分的mse loss很难进行构建

您好，感谢回复，请问我可以理解为Qwen2.5-Omni本身开源程度是允许对输出音频进行微调，但是就算不用LlamaFactory微调，用其他工具对输出音频进行微调都比较复杂，所以暂时没有人做出来是吗？

Kuangdd01 · 2025-07-10T08:46:25Z

您好，感谢回复，请问我可以理解为Qwen2.5-Omni本身开源程度是允许对输出音频进行微调，但是就算不用LlamaFactory微调，用其他工具对输出音频进行微调都比较复杂，所以暂时没有人做出来是吗？

see QwenLM/Qwen2.5-Omni#219 (comment)

HPUhushicheng · 2025-08-09T14:32:40Z

# for lora
python3 ./scripts/qwen_omni_merge.py merge_lora \
  --base_model_path="./Qwen2_5Omni" \
  --lora_checkpoint_path="./lora_checkpoint" \
  --save_path="./target_dir"

可以加入模型合并后的量化吗？

qq31415926 · 2025-08-10T09:42:19Z

请问qwen2.5 omni支持多轮对话微调吗

squirrelfish · 2025-08-20T07:45:21Z

请问模型能不能不合并，直接以lora的加载方式进行推理呢

wulaoshi · 2025-08-26T08:04:58Z

请问模型能不能不合并，直接以lora的加载方式进行推理呢

vllm目前不支持该模型这样推理。

Kuangdd01 and others added 9 commits March 27, 2025 06:21

preserve image_sizes

fcc8d26

preserve image_sizes

f2030a8

init plugin

896b408

support audio-text2text lora

11a4469

nit

637bf77

support image/video-text2text, audio-text2text

782c224

Merge branch 'hiyouga:main' into add_qwen_omni

ef07e2f

remove args

f297b64

remove lines

fb32b4f

Kuangdd01 changed the title ~~[WIP] Add Qwen2_5_Omni.thinker~~ [model][WIP] Add Qwen2_5_Omni.thinker Mar 30, 2025

Kuangdd01 added 2 commits March 31, 2025 10:04

add docs && nit

05aa519

remove some comments

dbf6cd0

Kuangdd01 changed the title ~~[model][WIP] Add Qwen2_5_Omni.thinker~~ [model] Add Qwen2_5_Omni.thinker Mar 31, 2025

hiyouga self-requested a review March 31, 2025 10:11

hiyouga requested changes Mar 31, 2025

View reviewed changes

src/llamafactory/data/collator.py Show resolved Hide resolved

src/llamafactory/data/collator.py Outdated Show resolved Hide resolved

fix && add merge part script

e16d54d

Kuangdd01 requested a review from hiyouga March 31, 2025 12:23

hiyouga approved these changes Mar 31, 2025

View reviewed changes

add license

35c32cd

hiyouga approved these changes Mar 31, 2025

View reviewed changes

hiyouga merged commit 185c76f into hiyouga:main Mar 31, 2025
12 checks passed

hiyouga added the solved This problem has been already solved label Mar 31, 2025

zzhdbw mentioned this pull request Mar 31, 2025

Qwen2.5-Omni Audio finetune bug ！ #7552

Closed

1 task

hiyouga mentioned this pull request Apr 1, 2025

[infer] vllm video/audio inference #7566

Merged

2 tasks

hiyouga mentioned this pull request Jun 9, 2025

Qwen2.5-Omni使用zero3训练保存的文件 #8336

Closed

1 task

gitzlc mentioned this pull request Aug 27, 2025

llamafactory lora error：Target module GELUActivation() is not supported #9033

Open

1 task

[model] Add Qwen2.5-Omni #7537

[model] Add Qwen2.5-Omni #7537

Uh oh!

Conversation

Kuangdd01 commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Env Info

TODO List

Test recipes

Checkpoint Merging

For Finetuning with data video + audio

For vllm serve

Before submitting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Harryjun commented Mar 31, 2025

Uh oh!

hiyouga commented Mar 31, 2025

Uh oh!

zzhdbw commented Mar 31, 2025

What does this PR do?

Before submitting

Uh oh!

xiexiaoshinick commented Apr 1, 2025

Uh oh!

Kuangdd01 commented Apr 1, 2025

Uh oh!

xiexiaoshinick commented Apr 2, 2025

Uh oh!

MonsterPPPP commented Jun 6, 2025

Uh oh!

Kuangdd01 commented Jun 6, 2025

Uh oh!

JOY-SWang commented Jun 6, 2025

Uh oh!

Kuangdd01 commented Jun 6, 2025

Uh oh!

JOY-SWang commented Jun 6, 2025

Uh oh!

Kuangdd01 commented Jun 6, 2025

Uh oh!

MonsterPPPP commented Jun 9, 2025

Uh oh!

Kuangdd01 commented Jun 9, 2025

Uh oh!

wwfnb commented Jun 12, 2025

Uh oh!

Kuangdd01 commented Jun 12, 2025

Uh oh!

wwfnb commented Jun 12, 2025

Uh oh!

wwfnb commented Jun 12, 2025

Uh oh!

Kuangdd01 commented Jun 12, 2025

Uh oh!

zhangyuygss commented Jun 18, 2025

Uh oh!

Kuangdd01 commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crystalww commented Jul 2, 2025

Uh oh!

hiyouga commented Jul 2, 2025

Uh oh!

xiaoyi1734 commented Jul 3, 2025

Uh oh!

Kuangdd01 commented Jul 3, 2025

Uh oh!

yuanhx24 commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kuangdd01 commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuanhx24 commented Jul 10, 2025

Uh oh!

Kuangdd01 commented Jul 10, 2025

Kuangdd01 commented Mar 30, 2025 •

edited

Loading

For Finetuning with data `video + audio`

Kuangdd01 commented Jun 18, 2025 •

edited

Loading

yuanhx24 commented Jul 10, 2025 •

edited

Loading

Kuangdd01 commented Jul 10, 2025 •

edited

Loading