-
Notifications
You must be signed in to change notification settings - Fork 12.8k
Closed
Labels
modelModel specificModel specific
Description
There is a working bert.cpp implementation.
We should try to implement this in llama.cpp
and update the embedding
example to use it.
The implementation should follow mostly what we did to integrate Falcon.
Here are the main steps:
- Update
gguf.py
with BERT arch KV pairs and tensors - Python convert script using
gguf.py
to generate F16 model - add tokenizer implementation in
llama.cpp
- add function to build BERT graph
- add any new ops in
ggml
if needed - add CUDA offloading
- add tokenizer tests
lin72h, Green-Sky, monatis, FSSRepo, JohnClaw and 38 more
Metadata
Metadata
Assignees
Labels
modelModel specificModel specific