llama : add BERT support

There is a working [bert.cpp](https://github.com/skeskinen/bert.cpp) implementation.
We should try to implement this in `llama.cpp` and update the `embedding` example to use it.

The implementation should follow mostly what we did to integrate Falcon.
Here are the main steps:

- Update `gguf.py` with BERT arch KV pairs and tensors
- Python convert script using `gguf.py` to generate F16 model
- add tokenizer implementation in `llama.cpp`
- add function to build BERT graph
- add any new ops in `ggml` if needed
- add CUDA offloading
- add tokenizer tests


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : add BERT support #2872

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

llama : add BERT support #2872

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions