Skip to content

Conversation

1485840691
Copy link
Contributor

@1485840691 1485840691 commented Feb 18, 2024

Related issue: #1259

  1. reverse-kl (current default)
    command: examples/scripts/dpo.py --output_dir=dpo_anthropic_hh --model_name_or_path=gpt2 --per_device_train_batch_size 4 --max_steps 1000 --learning_rate 1e-5 --gradient_accumulation_steps 1 --logging_steps 10 --eval_steps 500 --output_dir=dpo_anthropic_hh --warmup_steps 150 --report_to wandb --logging_first_step --no_remove_unused_columns
    image
    image

  2. alpha-divergence w/ alpha=0.5
    command: examples/scripts/dpo.py --output_dir=dpo_anthropic_hh --model_name_or_path=gpt2 --per_device_train_batch_size 4 --max_steps 1000 --learning_rate 1e-5 --gradient_accumulation_steps 1 --logging_steps 10 --eval_steps 500 --output_dir=dpo_anthropic_hh --warmup_steps 150 --report_to wandb --logging_first_step --no_remove_unused_columns --f_divergence_type alpha_divergence --f_alpha_divergence_coef 0.5

https://wandb.ai/open_source/huggingface/runs/b943bky2?workspace=user-1485840691
image
image
image
image
image

  1. js_divergence
    command: examples/scripts/dpo.py --output_dir=dpo_anthropic_hh --model_name_or_path=gpt2 --per_device_train_batch_size 4 --max_steps 1000 --learning_rate 1e-5 --gradient_accumulation_steps 1 --logging_steps 10 --eval_steps 500 --output_dir=dpo_anthropic_hh --warmup_steps 150 --report_to wandb --logging_first_step --no_remove_unused_columns --f_divergence_type js_divergence

image
image

@1485840691 1485840691 marked this pull request as draft February 18, 2024 11:03
@kmn1024
Copy link

kmn1024 commented Feb 29, 2024

Does it make sense to explore a similar change to KTO loss, to allow trading off alignment for diversity there?

@1485840691
Copy link
Contributor Author

@kmn1024 I have no idea of whether the divergence function works for KTO since KTO loss ,in my understanding, is more close to a point-wise loss. While this divergence function is applied in DPO for pair-wise loss. Quote @kashif @younesbelkada for their comments on this. As for this PR, I will speed up to get the rest test complete by end of this week or early next week.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@suanflower
Copy link

Thank you very much for your work, but when using forward KL, the loss becomes 0.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants