-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Hello!
I'm afraid I don't have a quick snippet for you to reproduce this, but I've noticed that various models that I've finetuned using SFT+RM+PPO & SFT+DPO endlessly generate texts until max_new_tokens
is reached. This is quite frustrating as it always causes text to be cut off, and generally causes the generated text to be much longer than expected.
I'm wondering if you're familiar with this issue, and if you happen to know where the issue might lie? I.e. whether the model fails to train to follow the pattern well enough, or whether the generation configuration or tokenizer pad/eos tokens are set up incorrectly during inference.
To give an example, I have an instruction dataset where answers very consistently range between ~10 and ~15 tokens. Finetuning on this dataset with SFT and/or SFT+RM+PPO has resulted in models that generates 50 tokens if you set max_new_tokens=50
.
I'm open to any advice.
- Tom Aarsen