Skip to content

Could there be a bug in the FT implementation #173

@drd13

Description

@drd13

I've found what I think might be a bug in the implementation of the fine-tuning baseline. If this is indeed the case, this bug would yield incorrect results when the unlearning target is longer than one token.

Using the VSCode debugger, I found that the code in ft_main.py doesn't carry-out backpropagation properly. The current version of the code passes the prompts without the targets to the model by calling model(**inputs). It then gathers the logits of all tokens in the target from the last tokens logits. This will maximise the probability of all tokens in the target immediately succedding the prompt. This is not the correct behaviour which should maximise the probability of the first token in the target being a continuation to the input and then maximising the probability of the second token in the target being a continuation to the first token in the target...

I think this issue might be in the ROME repository, where the original code came from, where I've written an issue but they haven't responded. Thanks for any assistance you may offer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions