-
Notifications
You must be signed in to change notification settings - Fork 7k
feat: add llama.cpp provider w/ native tool calls #1946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 6760db3 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to 6760db3 in 3 minutes and 15 seconds
More details
- Looked at
1648
lines of code in26
files - Skipped
0
files when reviewing. - Skipped posting
10
drafted comments based on config settings.
1. src/shared/api.ts:1446
- Draft comment:
In normalizeApiConfiguration, the 'llama.cpp' case returns an empty model ID and uses the default model info. Confirm if this behavior is intended and well documented, as it may lead to UI inconsistencies. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. webview-ui/src/components/settings/ApiOptions.tsx:1127
- Draft comment:
The UI for the "llama.cpp" provider appears consistent. Verify that its configuration instructions and placeholder text properly inform the user on how to configure the provider. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =0%
<= threshold50%
This comment is asking the PR author to verify that the configuration instructions and placeholder text are correct. This falls under the rule of not asking the author to double-check or ensure something is correct, which is not allowed.
3. webview-ui/src/components/chat/ChatTextArea.tsx:757
- Draft comment:
For the 'llama.cpp' provider case, the code returns the literal string "undefined" as the model display name. This is inconsistent with the other cases (which use a formatted provider:model id) and may confuse users. Consider displaying a more descriptive label (for example, simply 'llama.cpp' or another appropriate placeholder) instead of "undefined". - Reason this comment was not posted:
Marked as duplicate.
4. webview-ui/src/components/settings/ApiOptions.tsx:1127
- Draft comment:
Ensure that the labels and placeholders for the 'llama.cpp' provider (Base URL and API Key) align with user expectations. Although the setup is consistent with other providers, consider if any additional guidance or fallback defaults are needed given that the corresponding selected model id remains empty. - Reason this comment was not posted:
Comment did not seem useful. Confidence is useful =20%
<= threshold50%
The comment is asking the author to ensure alignment with user expectations and consider additional guidance or defaults. It doesn't provide a specific suggestion or point out a clear issue with the code. It seems to be more of a general suggestion rather than a specific actionable comment.
5. src/api/providers/bedrock.ts:93
- Draft comment:
Typo detected: In the comment on line 93, the phrase 'executing a an AWS provider chain' should be corrected to 'executing an AWS provider chain' (remove the extra 'a'). - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. src/api/providers/bedrock.ts:110
- Draft comment:
Typo detected: In the comment on line 110, the phrase 'will already be already provided' contains a duplicated 'already'. Please remove one for clarity. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. src/api/providers/openai-native.ts:33
- Draft comment:
Typo: In the comment on line 33, please change 'o1 doesnt support streaming, non-1 temp, or system prompt' to 'o1 doesn't support streaming, non-1 temp, or system prompt'. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. src/api/providers/openrouter.ts:72
- Draft comment:
Typo in comment: 'a image_url type message' should be 'an image_url type message'. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
9. src/core/webview/ClineProvider.ts:330
- Draft comment:
Typo: In the comment explaining the codicon font usage, 'css fileinto' should be updated to 'css file into' for clarity. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
10. src/core/webview/ClineProvider.ts:1720
- Draft comment:
Potential Typo: The global state key 'qwenApiLine' is used in getState() while the secret key uses 'qwenApiKey'. Consider verifying if 'qwenApiLine' is intended or if it should be 'qwenApiKey' to maintain consistency. - Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.
Workflow ID: wflow_PPNIES0QeEzoZdBI
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
@@ -755,6 +755,8 @@ const ChatTextArea = forwardRef<HTMLTextAreaElement, ChatTextAreaProps>( | |||
return `${selectedProvider}:${apiConfiguration.lmStudioModelId}` | |||
case "ollama": | |||
return `${selectedProvider}:${apiConfiguration.ollamaModelId}` | |||
case "llama.cpp": | |||
return "undefined" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the switch for provider display names, the case for "llama.cpp" returns the literal string "undefined". Consider providing a meaningful default label instead of the literal "undefined".
return "undefined" | |
return "llama.cpp:default" |
It seems like we're not passing in any tools to the requests–these are already defined in the system prompt, and allows Cline to work with models/APIs that dont support tool calling. Not sure how much of a demand there is for llama.cpp as a provider, but perhaps we could limit the change to that? Feel free to re-open |
@saoudrizwan It does (great job btw!), but the prompt is very optimised for Claude (XML heavy) and confuses smaller local models, especially when they've been fine-tuned to call tools with JSON-based syntaxes (and with tool descriptions in a format chosen by their Jinja template, typically just verbatim JSON schema but sometimes the template converts the tool signatures to TypeScript, e.g for functionary templates). Besides, it's not just XML vs. JSON, most models now use special tokens to signal tool calls boundaries, so it's best to show them examples in the format they know. In a nutshell, this PR:
I only got disappointing results with the existing OpenAI-Compatible provider talking to llama.cpp. But if you talk to Qwen 2.5 Coder 32B or even Phi 4 with the right prompt, they make wonders :-D cc/ @ngxson @ggerganov |
Note that llama-server now also powers HuggingFace Inference Endpoints so it’s not just for local ai enthusiasts. |
News, is there a way to run Cline with llama.cpp? |
Description
This adds a provider for llama.cpp's
llama-server
, leveraging its recently introduced grammar-constrained, universal tool call support.Structural changes:
system.ts
:ApiHandler.createMessage
(ifModelInfo.supportsTools
)Limitations:
llama-server
with the--verbose
flag to see what's happening (especially when the KV cache isn't hot yet on first call)llama-server
doesn't support tool calls in streaming mode yet/apply-template
endpoint w/ + fresh new add_generation_prompt param to compute tool call deltas. This is used in the system prompt to express the few-shot tool use examples with the call syntax native to the model, which helps it tremendously (esp. when small).tool-call
: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars ggml-org/llama.cpp#12034 (under review / not merged yet)TODOs / follow ups:
fetch failed
)tool-call
: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars ggml-org/llama.cpp#12034 (non-blocker)Note that switching to native tool use for other providers should be possible (even for Claude, assuming it doesn't mess with its caching), but one would need to find a nice syntax to format tool call examples.
Test Procedure
I haven't tested this thoroughly yet, but did check that the system prompt didn't budge (only diff is in
Example 6
where JSON arguments are now fully indented - incl. labels & assignees)Mostly had to disabled MCP to save prompt, tested successfully w/ long context window models (128k tokens):
Some notes:
tool-call
: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars ggml-org/llama.cpp#12034unsloth/phi-4-GGUF
supports 128k context window but it's not set in its properties (the other models support-c 0
to mean "use the model's maximum")Random prompts tested:
Type of Change
Pre-flight Checklist
npm test
) and code is formatted and linted (npm run format && npm run lint
)npm run changeset
(required for user-facing changes)Screenshots
Additional Notes
Still some todos but could be done as follow ups.
Important
Adds llama.cpp provider with tool call support, updating API handling, configuration, and UI components to integrate llama-server.
LlamaCppHandler
inllama.cpp.ts
to support llama.cpp'sllama-server
with tool call functionality.createMessage
function across multiple providers to includetools
parameter.system.ts
.llama.cpp
as a newApiProvider
inapi.ts
.ApiOptions
component to include llama.cpp configuration fields.llamaCppBaseUrl
andllamaCppApiKey
toApiHandlerOptions
.ChatTextArea.tsx
andApiOptions.tsx
to support llama.cpp in the UI.llamaCppModelInfoSaneDefaults
inapi.ts
for default model info.ClineProvider.ts
to handle llama.cpp specific configurations.This description was created by
for 6760db3. It will automatically update as commits are pushed.