[Question] Why TR-DPO default alpha and tau don't match the values suggested in the paper?

(cc @syrn1k, author of #1593) In the paper, they seem to recommend α = 0.6, τ = 512

<img width="902" alt="Screenshot 2024-08-28 at 17 58 11" src="https://github.com/user-attachments/assets/16016b14-ed80-4090-a243-bedfd0446c70">
<img width="902" alt="Screenshot 2024-08-28 at 17 58 29" src="https://github.com/user-attachments/assets/a39aff74-9d61-4c50-bd3d-ede61482043e">

while in trl, we've α = 0.9, τ = 64

https://github.com/huggingface/trl/blob/10f70fa3337826ffb8c2e0eb0de00051ea53563b/trl/trainer/dpo_config.py#L143-L144



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Why TR-DPO default alpha and tau don't match the values suggested in the paper? #1991

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	ref_model_mixup_alpha: float = 0.9
	ref_model_sync_steps: int = 64

[Question] Why TR-DPO default alpha and tau don't match the values suggested in the paper? #1991

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions