Skip to content

Conversation

mst272
Copy link
Contributor

@mst272 mst272 commented Aug 15, 2024

The tokenizer.apply_chat_template is used in this example(axamples/scripts/dpo.py).

The tokenizer.apply_chat_template is used separately for prompt, chosen, and rejected. This is correct in most cases, but errors will occur if the model used has a default system.

For example, in the qwen model, using tokenizer.apply_chat_template for prompt and chosen respectively, we get the following:

prompt:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Do you know the singer Adele?<|im_end|>

chosen:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>assistant
Sure! Here are some ways to eat bananas and dragonfruits together<|im_end|>

If use tokenizer.apply_chat_template in a complete conversation, you will get the following result:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Do you know the singer Adele?<|im_end|>
<|im_start|>assistant
Sure! Here are some ways to eat bananas and dragonfruits together<|im_end|>

This means that the incoming prompt and chosen are not correctly concatenated in DpoTrainer.

So in this case, we need to judge the position of prompt and chosen more accurately, which is the function added by my PR.

@qgallouedec
Copy link
Member

Thank you for reporting this case. It's actually quite annoying. It comes from the fact that the chat template from Qwen2 always adds a system prompt, even if the user doesn't want it, and there's no way to disable it.

Do you have any other examples of problem cases?

To be honest, I'm not a fan of adding 25 intricate lines for a single case where the chat template is poorly constructed. Perhaps documenting this case, and suggest fixing the template as follows might be simpler?

tokenizer.chat_template = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",

@mst272
Copy link
Contributor Author

mst272 commented Aug 16, 2024

There are indeed many other examples of problem cases. I've grouped these issue cases into three main categories.

1.The first is system issues I mentioned earlier, with representative models such as QWEN, Llama3.1 and DeepSeek-Coder

For example, in llama3.1:

  • prompt:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Do you know the singer Adele?<|eot_id|>
  • chosen:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Sure! Here are some ways to eat bananas and dragonfruits together<|eot_id|>

complete conversation:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

<|eot_id|><|start_header_id|>user<|end_header_id|>

Do you know the singer Adele?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Sure! Here are some ways to eat bananas and dragonfruits together<|eot_id|>

Similar to the issue with QWEN mentioned earlier, the model always appends a system prompt.

2.The second is bos or eos issues, with representative models such as Phi-3-mini-128k-instruct, Llama3

In Phi-3, the differences between the two approaches are as follows:

  • prompt:
<|user|>
Do you know the singer Adele?<|end|>
<|endoftext|>
  • chosen:
<|assistant|>
Sure! Here are some ways to eat bananas and dragonfruits together<|end|>
<|endoftext|>

complete conversation:

<|user|>
Do you know the singer Adele?<|end|>
<|assistant|>
Sure! Here are some ways to eat bananas and dragonfruits together<|end|>
<|endoftext|>

The issue here is that after each application of the apply_chat_template, the text <|endoftext|> is being appended at the end.

In Llama3:

  • prompt
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Do you know the singer Adele?<|eot_id|>
  • chosen
<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>

Sure! Here are some ways to eat bananas and dragonfruits together<|eot_id|>

complete conversation:

<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Do you know the singer Adele?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Sure! Here are some ways to eat bananas and dragonfruits together<|eot_id|>

The issue here is that after each application of the apply_chat_template, the text <|begin_of_text|> is being appended at the begining.

3.The third is raise error issues, with representative models such as gemma-2b-it and Mistral-7B-Instruct-v0___2

This means that the apply_chat_template method of these models does not support a single assistant. For example,when I run the following code, an error occurs:

messages = [{"role": "assistant", "content": "Sure! Here are some ways to eat bananas and dragonfruits together"}]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=False)

TemplateError: Conversation roles must alternate user/assistant/user/assistant/...

In general, the above are just some examples I tried, and there should be more models with the above three problems. And I didn't cover the third fix mentioned above, I will update it in the latest code.
To be honest, I also feel that the code is a bit complicated, but I haven't thought of other methods for now.

Add the third issue fixs
@qgallouedec
Copy link
Member

Thanks for this detailed answer! We definitely have to do something then. I'll look into it

@kashif
Copy link
Collaborator

kashif commented Aug 18, 2024

@mst272 thanks for the report... do you think that this issue is properly handled on the tokenizer repository?

@mst272
Copy link
Contributor Author

mst272 commented Aug 18, 2024

hi @kashif , I think this issue is unrelated to the tokenizer because each model applies its own apply_chat_template, and the tokenizer just applies based on the chat_template. In the dpo.py example, if we use apply_chat_template, maybe we need to make judgments for certain situations.

@kashif
Copy link
Collaborator

kashif commented Aug 18, 2024

I see @mst272 so perhaps we can make this a utility helper in TRL so that other trainers can also use this pattern?

@mst272
Copy link
Contributor Author

mst272 commented Aug 18, 2024

hi@kashif , I've made this a utility and provided an example on DPO. Please review to see if it's appropriate, and then we'll apply these to other trainers:)

@AIR-hl
Copy link
Contributor

AIR-hl commented Aug 25, 2024

The tokenizer.apply_chat_template is used in this example(axamples/scripts/dpo.py).

The tokenizer.apply_chat_template is used separately for prompt, chosen, and rejected. This is correct in most cases, but errors will occur if the model used has a default system.

For example, in the qwen model, using tokenizer.apply_chat_template for prompt and chosen respectively, we get the following:

prompt:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Do you know the singer Adele?<|im_end|>

chosen:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>assistant
Sure! Here are some ways to eat bananas and dragonfruits together<|im_end|>

If use tokenizer.apply_chat_template in a complete conversation, you will get the following result:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Do you know the singer Adele?<|im_end|>
<|im_start|>assistant
Sure! Here are some ways to eat bananas and dragonfruits together<|im_end|>

This means that the incoming prompt and chosen are not correctly concatenated in DpoTrainer.

So in this case, we need to judge the position of prompt and chosen more accurately, which is the function added by my PR.

To be honest, I think both the examples in dpo.py and the examples of qwen you provided are special cases, and in fact, we cannot cover all templates. dpo.py is just an example based on the current model and dataset, and users need to modify their code according to their actual situation. For example, if we use gemma, we will need another modification on map function due to the Jinja style template it uses.

@qgallouedec
Copy link
Member

qgallouedec commented Sep 10, 2024

@mst272 we've decided to refactor more heavily the way we handle dataset in TRL, see #2020. Can you take a look and verify it solves the issue you initially spotted?
By the way, your precise explanation helped me lot, thank you for that.

@mst272
Copy link
Contributor Author

mst272 commented Sep 11, 2024

hi @qgallouedec ,that's indeed a heavily refactor,very nice work!

@qgallouedec
Copy link
Member

Closing in favour of #2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants