Prevent finetuned models from always generating endlessly

Hello!

I'm afraid I don't have a quick snippet for you to reproduce this, but I've noticed that various models that I've finetuned using SFT+RM+PPO & SFT+DPO endlessly generate texts until `max_new_tokens` is reached. This is quite frustrating as it always causes text to be cut off, and generally causes the generated text to be much longer than expected.

I'm wondering if you're familiar with this issue, and if you happen to know where the issue might lie? I.e. whether the model fails to train to follow the pattern well enough, or whether the generation configuration or tokenizer pad/eos tokens are set up incorrectly during inference.

To give an example, I have an instruction dataset where answers very consistently range between ~10 and ~15 tokens. Finetuning on this dataset with SFT and/or SFT+RM+PPO has resulted in models that generates 50 tokens if you set `max_new_tokens=50`.

I'm open to any advice.

- Tom Aarsen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prevent finetuned models from always generating endlessly #734

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Prevent finetuned models from always generating endlessly #734

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions