Conversational dataset support for Online DPO #2075

qgallouedec · 2024-09-16T17:06:23Z

What does this PR do?

Item of #2071

Changes:

Major

Added conversational dataset support to Online DPO (and consequently to Nash-MD and XPO)
Removed applying the chat template in the example script (XPO, Nash-MD and Online DPO)
Added tests for conversational dataset support (XPO, Nash-MD and Online DPO)
Added a better default learning rate for OnlineDPOConfig (consequently changes the default for XPOConfig and NashMDConfig) (similar to [KTO] learning rate recomentations for kto #2070)
Changed the structure of the doc of XPO, Nash-MD and Online DPO to have
1. Overview
2. Quick start
3. Usage tips
4. Example script
5. Logged metrics
6. Autofocus of config and trainer

Minor

Changed Nash MD to Nash-MD
Added autodoc of LogCompletionsCallback
Added the generation script for https://huggingface.co/datasets/trl-lib/ultrafeedback-prompt
Other minor fixes

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-09-17T15:07:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* first modifications in the documentation * Add script for processing ultrafeedback prompt dataset * Remove unused variable in ultrafeedback.py * style * apply chat template within the init * extend test * new default lr * nash md and xpo conv test * Update prompt length check to 512 characters * remove `maybe_apply_chat_template` in XPO and Nash examples * polish online dpo doc * better section name * LogCompletionsCallback doc * optional generation config * reorder stats (consistency with online dpo) * update online dpo doc * format online dpo config * format nash_md config * update nash md * Nash MD -> Nash-MD * xpo doc * doc

qgallouedec added 7 commits September 16, 2024 17:03

first modifications in the documentation

70f4019

Add script for processing ultrafeedback prompt dataset

73a185d

Remove unused variable in ultrafeedback.py

98e0bde

style

1d2c868

apply chat template within the init

faffc11

extend test

121ec6b

new default lr

997c9c3

qgallouedec mentioned this pull request Sep 16, 2024

[Tracking issue] General dataset support #2071

Open

29 tasks

qgallouedec and others added 7 commits September 16, 2024 18:58

nash md and xpo conv test

a2aeec6

Update prompt length check to 512 characters

30f36c0

Merge branch 'main' into online-dpo-dataset-support

d9dbb98

Merge branch 'main' into online-dpo-dataset-support

7e482c9

remove maybe_apply_chat_template in XPO and Nash examples

075faa9

polish online dpo doc

e2cac3f

better section name

f7fa597

qgallouedec added 10 commits September 17, 2024 15:10

LogCompletionsCallback doc

a54b1a1

optional generation config

d40cd75

reorder stats (consistency with online dpo)

1ba0ca8

update online dpo doc

a2ea4d1

format online dpo config

32d85d3

format nash_md config

8281af1

update nash md

78a0294

Nash MD -> Nash-MD

4474c7c

xpo doc

a9e601b

doc

131798b

qgallouedec marked this pull request as ready for review September 18, 2024 08:22

qgallouedec requested review from kashif, edbeeching and lewtun September 18, 2024 08:22

kashif approved these changes Sep 18, 2024

View reviewed changes

qgallouedec merged commit 6920c2d into main Sep 18, 2024
10 checks passed

qgallouedec deleted the online-dpo-dataset-support branch September 18, 2024 12:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Conversational dataset support for Online DPO #2075

Conversational dataset support for Online DPO #2075

Uh oh!

qgallouedec commented Sep 16, 2024 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Sep 17, 2024

Uh oh!

Uh oh!

Uh oh!

Conversational dataset support for Online DPO #2075

Conversational dataset support for Online DPO #2075

Uh oh!

Conversation

qgallouedec commented Sep 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes:

Major

Minor

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 17, 2024

Uh oh!

Uh oh!

Uh oh!

qgallouedec commented Sep 16, 2024 •

edited

Loading