generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
Description
Reproduction
In #3572 @qgallouedec simplified processing conversational data.
However it also alters the interaction with the tokenizer because it changes from accessing it in a item access (processed["input_ids"]
) to an attribute access (processed.input_ids
). processed
is an output of the tokenizer. But the tokenizer is not necessarily under the control of the library since it is user-provided and may be custom.
Is this an intentional breaking change? If yes, why? It forces users to write their tokenizers to return BatchEncoding
rather than plain dicts.
This PR was merged between 0.18.2 and 0.19.0
I am referring to this line:
trl/trl/trainer/sft_trainer.py
Line 731 in ab331bf
prompt_ids = processing_class(text=example["prompt"]).input_ids |
System Info
v 0.19.0
Checklist
- I have checked that my issue isn't already filed (see open issues)
- I have included my system information
- Any code provided is minimal, complete, and reproducible (more on MREs)
- Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
- Any traceback provided is complete