Add a FastAPI app #113

juliendenize · 2025-07-01T18:03:56Z

This PR adds a FastAPI app server to mistral-common.

The features include:

tokenizing a prompt, list of messages, or chat completion requests (ours and openai)
detokenize a list of tokens
~~apply a "chat template". We actually don't have chat templates in mistral-common but a dedicated route will return the same string as a chat template would.~~ Not gonna do that as discussed on Slack because might be prone to issues (we should tokenize always as ints)

This should improve integrations with some LLM inference backends such as llama.cpp.

Edit:
Now also supports tool call parsing via the detokenize route when passing to the request as_message=True

src/mistral_common/app/main.py

src/mistral_common/experimental/app/main.py

src/mistral_common/experimental/app/models.py

src/mistral_common/experimental/app/routers.py

docs/usage/experimental.md

patrickvonplaten

Some nits for docs

juliendenize requested a review from patrickvonplaten July 1, 2025 18:03

juliendenize self-assigned this Jul 1, 2025

patrickvonplaten reviewed Jul 2, 2025

View reviewed changes

src/mistral_common/app/main.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jul 2, 2025

View reviewed changes

src/mistral_common/app/main.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jul 2, 2025

View reviewed changes

src/mistral_common/app/main.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Jul 2, 2025

View reviewed changes

src/mistral_common/app/main.py Outdated Show resolved Hide resolved

juliendenize mentioned this pull request Jul 2, 2025

Improve Mistral integration juliendenize/llama.cpp#1

Closed

juliendenize force-pushed the improve_llama_cpp_integration branch 3 times, most recently from 91435d4 to e85ebee Compare July 16, 2025 12:40

juliendenize mentioned this pull request Jul 17, 2025

Improve Mistral models integration with llama.cpp ggml-org/llama.cpp#14737

Merged

juliendenize and others added 17 commits July 23, 2025 16:34

app

62a9c47

wip

3d71646

wip: add tests and refactor

7f4f5f9

Update main.py

8dedd7c

Update main.py

918b6b8

format

258ca55

wip: remove encode txt routes

364d63e

wip

5956862

wip

21a4952

wip: tool calls

82c3932

clean

2367b76

clean

7179496

clean

87ae0bb

refactor

ca69051

refactor

ff7bfd7

refactor

719c32d

clean

ec07ed7

juliendenize force-pushed the improve_llama_cpp_integration branch from e85ebee to ec07ed7 Compare July 23, 2025 14:35

Add think parser

1663011

juliendenize added 8 commits July 24, 2025 19:22

lint

d5fdd39

Refactor

d34ecdc

Add request validation

3a12dd7

fix click option

cd5f931

Documentation

a27e439

wip

f4bdc71

force tokenizer load at start

224a775

format

cd14dd9

juliendenize requested a review from patrickvonplaten July 25, 2025 12:59