-
Notifications
You must be signed in to change notification settings - Fork 314
Description
I've found what I think might be a bug in the implementation of the fine-tuning baseline. If this is indeed the case, this bug would yield incorrect results when the unlearning target is longer than one token.
Using the VSCode debugger, I found that the code in ft_main.py
doesn't carry-out backpropagation properly. The current version of the code passes the prompts without the targets to the model by calling model(**inputs)
. It then gathers the logits of all tokens in the target from the last tokens logits. This will maximise the probability of all tokens in the target immediately succedding the prompt. This is not the correct behaviour which should maximise the probability of the first token in the target being a continuation to the input and then maximising the probability of the second token in the target being a continuation to the first token in the target...
I think this issue might be in the ROME repository, where the original code came from, where I've written an issue but they haven't responded. Thanks for any assistance you may offer.