Skip to content

Double-bos needs more general implementation #855

@terrykong

Description

@terrykong

i think we may need to generalize that bos check. i found another place where it fails:

message = tokenizer.apply_chat_template(
[user_message],
tokenize=False,
add_generation_prompt=True,
add_special_tokens=False,
)
user_message["token_ids"] = tokenizer(message, return_tensors="pt")["input_ids"][0]

in the deepscaler base when you try to eval, it’ll add the double-bos, but the chat-template doesn’t have any obvious indicators b/c it’s a more complicated jinja template

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/blob/main/tokenizer_config.json#L34

i think we may need some kind of apply_safe_chat_template() or something that we re-use throughout the repo that remembers the decision to handle this bos-token. do you think you can look into a general fix?

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions