Skip to content

[Question] Why TR-DPO default alpha and tau don't match the values suggested in the paper? #1991

@qgallouedec

Description

@qgallouedec

(cc @syrn1k, author of #1593) In the paper, they seem to recommend α = 0.6, τ = 512

Screenshot 2024-08-28 at 17 58 11 Screenshot 2024-08-28 at 17 58 29

while in trl, we've α = 0.9, τ = 64

ref_model_mixup_alpha: float = 0.9
ref_model_sync_steps: int = 64

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions