Fix apply_chat_template bugs for script/dpo #1930

mst272 · 2024-08-15T18:52:14Z

The tokenizer.apply_chat_template is used in this example(axamples/scripts/dpo.py).

The tokenizer.apply_chat_template is used separately for prompt, chosen, and rejected. This is correct in most cases, but errors will occur if the model used has a default system.

For example, in the qwen model, using tokenizer.apply_chat_template for prompt and chosen respectively, we get the following：

prompt：

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Do you know the singer Adele?<|im_end|>

chosen：
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>assistant
Sure! Here are some ways to eat bananas and dragonfruits together<|im_end|>

If use tokenizer.apply_chat_template in a complete conversation, you will get the following result：

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Do you know the singer Adele?<|im_end|>
<|im_start|>assistant
Sure! Here are some ways to eat bananas and dragonfruits together<|im_end|>

This means that the incoming prompt and chosen are not correctly concatenated in DpoTrainer.

So in this case, we need to judge the position of prompt and chosen more accurately, which is the function added by my PR.

qgallouedec · 2024-08-15T20:10:35Z

Thank you for reporting this case. It's actually quite annoying. It comes from the fact that the chat template from Qwen2 always adds a system prompt, even if the user doesn't want it, and there's no way to disable it.

Do you have any other examples of problem cases?

To be honest, I'm not a fan of adding 25 intricate lines for a single case where the chat template is poorly constructed. Perhaps documenting this case, and suggest fixing the template as follows might be simpler?

tokenizer.chat_template = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",

mst272 · 2024-08-16T06:34:36Z

There are indeed many other examples of problem cases. I've grouped these issue cases into three main categories.

1.The first is system issues I mentioned earlier, with representative models such as QWEN, Llama3.1 and DeepSeek-Coder

For example, in llama3.1:

prompt:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Do you know the singer Adele?<|eot_id|>

chosen:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Sure! Here are some ways to eat bananas and dragonfruits together<|eot_id|>

complete conversation:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Do you know the singer Adele?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Sure! Here are some ways to eat bananas and dragonfruits together<|eot_id|>

Similar to the issue with QWEN mentioned earlier, the model always appends a system prompt.

2.The second is bos or eos issues, with representative models such as Phi-3-mini-128k-instruct, Llama3

In Phi-3, the differences between the two approaches are as follows:

prompt:

<|user|>
Do you know the singer Adele?<|end|>
<|endoftext|>

chosen:

<|assistant|>
Sure! Here are some ways to eat bananas and dragonfruits together<|end|>
<|endoftext|>

complete conversation:

<|user|>
Do you know the singer Adele?<|end|>
<|assistant|>
Sure! Here are some ways to eat bananas and dragonfruits together<|end|>
<|endoftext|>

The issue here is that after each application of the apply_chat_template, the text <|endoftext|> is being appended at the end.

In Llama3：

prompt

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Do you know the singer Adele?<|eot_id|>

chosen

<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

Sure! Here are some ways to eat bananas and dragonfruits together<|eot_id|>

complete conversation:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Do you know the singer Adele?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Sure! Here are some ways to eat bananas and dragonfruits together<|eot_id|>

The issue here is that after each application of the apply_chat_template, the text <|begin_of_text|> is being appended at the begining.

3.The third is raise error issues, with representative models such as gemma-2b-it and Mistral-7B-Instruct-v0___2

This means that the apply_chat_template method of these models does not support a single assistant. For example,when I run the following code, an error occurs:

messages = [{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together"}]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=False)

TemplateError: Conversation roles must alternate user/assistant/user/assistant/...

In general, the above are just some examples I tried, and there should be more models with the above three problems. And I didn't cover the third fix mentioned above, I will update it in the latest code.
To be honest, I also feel that the code is a bit complicated, but I haven't thought of other methods for now.

Add the third issue fixs

qgallouedec · 2024-08-16T09:20:27Z

Thanks for this detailed answer! We definitely have to do something then. I'll look into it

kashif · 2024-08-18T07:08:23Z

@mst272 thanks for the report... do you think that this issue is properly handled on the tokenizer repository?

mst272 · 2024-08-18T07:38:38Z

hi @kashif , I think this issue is unrelated to the tokenizer because each model applies its own apply_chat_template, and the tokenizer just applies based on the chat_template. In the dpo.py example, if we use apply_chat_template, maybe we need to make judgments for certain situations.

kashif · 2024-08-18T09:18:16Z

I see @mst272 so perhaps we can make this a utility helper in TRL so that other trainers can also use this pattern?

mst272 · 2024-08-18T12:57:04Z

hi@kashif , I've made this a utility and provided an example on DPO. Please review to see if it's appropriate, and then we'll apply these to other trainers:)

AIR-hl · 2024-08-25T13:20:33Z

The tokenizer.apply_chat_template is used in this example(axamples/scripts/dpo.py).

The tokenizer.apply_chat_template is used separately for prompt, chosen, and rejected. This is correct in most cases, but errors will occur if the model used has a default system.

For example, in the qwen model, using tokenizer.apply_chat_template for prompt and chosen respectively, we get the following：
prompt：

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Do you know the singer Adele?<|im_end|>

chosen：
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>assistant
Sure! Here are some ways to eat bananas and dragonfruits together<|im_end|>
If use tokenizer.apply_chat_template in a complete conversation, you will get the following result：
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Do you know the singer Adele?<|im_end|>
<|im_start|>assistant
Sure! Here are some ways to eat bananas and dragonfruits together<|im_end|>
This means that the incoming prompt and chosen are not correctly concatenated in DpoTrainer.

So in this case, we need to judge the position of prompt and chosen more accurately, which is the function added by my PR.

To be honest, I think both the examples in dpo.py and the examples of qwen you provided are special cases, and in fact, we cannot cover all templates. dpo.py is just an example based on the current model and dataset, and users need to modify their code according to their actual situation. For example, if we use gemma, we will need another modification on map function due to the Jinja style template it uses.

qgallouedec · 2024-09-10T14:08:22Z

@mst272 we've decided to refactor more heavily the way we handle dataset in TRL, see #2020. Can you take a look and verify it solves the issue you initially spotted?
By the way, your precise explanation helped me lot, thank you for that.

mst272 · 2024-09-11T02:03:05Z

hi @qgallouedec ,that's indeed a heavily refactor,very nice work!

qgallouedec · 2024-09-11T18:06:02Z

Closing in favour of #2020

Fix templates for dpo, etc.

4d368ec

Update dpo.py

36b6a6b

Add the third issue fixs

make this a utility.

496a0e0

qgallouedec closed this Sep 11, 2024

This was referenced Jan 20, 2025

A bug in DPOTrainer, where 'chosen' and 'rejected' columns need to be reversed in order to get desired behavior #2578

Closed

🐙 MPO #2544

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix apply_chat_template bugs for script/dpo #1930

Fix apply_chat_template bugs for script/dpo #1930

Uh oh!

mst272 commented Aug 15, 2024

Uh oh!

qgallouedec commented Aug 15, 2024

Uh oh!

mst272 commented Aug 16, 2024 •

edited

Loading

Uh oh!

qgallouedec commented Aug 16, 2024

Uh oh!

kashif commented Aug 18, 2024

Uh oh!

mst272 commented Aug 18, 2024

Uh oh!

kashif commented Aug 18, 2024

Uh oh!

mst272 commented Aug 18, 2024

Uh oh!

AIR-hl commented Aug 25, 2024 •

edited

Loading

Uh oh!

qgallouedec commented Sep 10, 2024 •

edited

Loading

Uh oh!

mst272 commented Sep 11, 2024

Uh oh!

qgallouedec commented Sep 11, 2024

Uh oh!

Uh oh!

Fix apply_chat_template bugs for script/dpo #1930

Fix apply_chat_template bugs for script/dpo #1930

Uh oh!

Conversation

mst272 commented Aug 15, 2024

Uh oh!

qgallouedec commented Aug 15, 2024

Uh oh!

mst272 commented Aug 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Aug 16, 2024

Uh oh!

kashif commented Aug 18, 2024

Uh oh!

mst272 commented Aug 18, 2024

Uh oh!

kashif commented Aug 18, 2024

Uh oh!

mst272 commented Aug 18, 2024

Uh oh!

AIR-hl commented Aug 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qgallouedec commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mst272 commented Sep 11, 2024

Uh oh!

qgallouedec commented Sep 11, 2024

Uh oh!

Uh oh!

mst272 commented Aug 16, 2024 •

edited

Loading

AIR-hl commented Aug 25, 2024 •

edited

Loading

qgallouedec commented Sep 10, 2024 •

edited

Loading