Could there be a bug in the FT implementation

I've found what I think might be a bug in the implementation of the fine-tuning baseline. If this is indeed the case, this bug would yield incorrect results when the unlearning target is longer than one token.

Using the VSCode debugger, I found that the code in `ft_main.py` doesn't carry-out backpropagation properly. The current version of the code passes the prompts without the targets to the model  by calling `model(**inputs)`. It then gathers the logits of all tokens in the target from the last tokens logits. This will maximise the probability of all tokens in the target immediately succedding the prompt. This is not the correct behaviour which should maximise the probability of the first token in the target being a continuation to the input and then maximising the probability of the second token in the target being a continuation to the first token in the target...

I think this issue might be in the ROME repository, where the original code came from, where I've written an issue but they haven't responded. Thanks for any assistance you may offer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Could there be a bug in the FT implementation #173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Could there be a bug in the FT implementation #173

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions