Skip to content

izhx/q3e-st-files

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

q3e-st-files

Files for SentenceTransformer support (0.6B model as the example). Will push to the huggingface model repos.

Convert tokenizer:

import tokenizers

name_or_path = "TODO"

tok = AutoTokenizer.from_pretrained(name_or_path)
print(tok.tokenize('test 1, test 2'), tok('test 1, test 2'))
template_processor = tokenizers.processors.TemplateProcessing(
    single="$A <|endoftext|>", pair="$A $B <|endoftext|>", special_tokens=[("<|endoftext|>", 151643)]
)
tok.backend_tokenizer.post_processor = tokenizers.processors.Sequence([
    tok.backend_tokenizer.post_processor, template_processor
])
print(tok.tokenize('test 1, test 2'), tok('test 1, test 2'))

tok.save_pretrained(name_or_path + '-eos')

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published