Skip to content

Conversation

juliendenize
Copy link
Contributor

@juliendenize juliendenize commented Jul 1, 2025

This PR adds a FastAPI app server to mistral-common.

The features include:

  • tokenizing a prompt, list of messages, or chat completion requests (ours and openai)
  • detokenize a list of tokens
  • apply a "chat template". We actually don't have chat templates in mistral-common but a dedicated route will return the same string as a chat template would. Not gonna do that as discussed on Slack because might be prone to issues (we should tokenize always as ints)

This should improve integrations with some LLM inference backends such as llama.cpp.

Edit:
Now also supports tool call parsing via the detokenize route when passing to the request as_message=True

@juliendenize juliendenize self-assigned this Jul 1, 2025
@juliendenize juliendenize force-pushed the improve_llama_cpp_integration branch from e85ebee to ec07ed7 Compare July 23, 2025 14:35
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits for docs

@juliendenize juliendenize merged commit 10b44c0 into main Jul 25, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants