-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Fix apply_chat_template bugs for script/dpo #1930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you for reporting this case. It's actually quite annoying. It comes from the fact that the chat template from Qwen2 always adds a system prompt, even if the user doesn't want it, and there's no way to disable it. Do you have any other examples of problem cases? To be honest, I'm not a fan of adding 25 intricate lines for a single case where the chat template is poorly constructed. Perhaps documenting this case, and suggest fixing the template as follows might be simpler? tokenizer.chat_template = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}", |
There are indeed many other examples of problem cases. I've grouped these issue cases into three main categories. 1.The first is For example, in llama3.1:
complete conversation:
Similar to the issue with QWEN mentioned earlier, the model always appends a system prompt. 2.The second is In Phi-3, the differences between the two approaches are as follows:
complete conversation:
The issue here is that after each application of the apply_chat_template, the text <|endoftext|> is being appended at the end. In Llama3:
complete conversation:
The issue here is that after each application of the apply_chat_template, the text <|begin_of_text|> is being appended at the begining. 3.The third is This means that the apply_chat_template method of these models does not support a single assistant. For example,when I run the following code, an error occurs:
In general, the above are just some examples I tried, and there should be more models with the above three problems. And I didn't cover the third fix mentioned above, I will update it in the latest code. |
Add the third issue fixs
Thanks for this detailed answer! We definitely have to do something then. I'll look into it |
@mst272 thanks for the report... do you think that this issue is properly handled on the |
hi @kashif , I think this issue is unrelated to the tokenizer because each model applies its own apply_chat_template, and the tokenizer just applies based on the chat_template. In the dpo.py example, if we use apply_chat_template, maybe we need to make judgments for certain situations. |
I see @mst272 so perhaps we can make this a utility helper in TRL so that other trainers can also use this pattern? |
hi@kashif , I've made this a utility and provided an example on DPO. Please review to see if it's appropriate, and then we'll apply these to other trainers:) |
To be honest, I think both the examples in |
hi @qgallouedec ,that's indeed a heavily refactor,very nice work! |
Closing in favour of #2020 |
The tokenizer.apply_chat_template is used in this example(axamples/scripts/dpo.py).
The tokenizer.apply_chat_template is used separately for prompt, chosen, and rejected. This is correct in most cases, but errors will occur if the model used has a default system.
For example, in the qwen model, using tokenizer.apply_chat_template for prompt and chosen respectively, we get the following:
If use tokenizer.apply_chat_template in a complete conversation, you will get the following result:
This means that the incoming prompt and chosen are not correctly concatenated in DpoTrainer.
So in this case, we need to judge the position of prompt and chosen more accurately, which is the function added by my PR.