-
Notifications
You must be signed in to change notification settings - Fork 74
Mac
Show llama-vscode menu (Ctrl+Shift+M) and select "Install/upgrade llama.cpp" (if not yet done). After that add/select the models you want to use.
The instructions below are left for a reference, but now it is possible to do it easier - add a model from the menu and select it.
Prerequisites - Homebrew
Used for
- code completion
LLM type
- FIM (fill in the middle)
Instructions
- Install llama.cpp with the command
`brew install llama.cpp`
- Download the LLM model and run llama.cpp server (combined in one command)
- If you have more than 16GB VRAM:
`llama-server -hf ggml-org/Qwen2.5-Coder-7B-Q8_0-GGUF:Q8_0 --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256`
- If you have less than 16GB VRAM:
`llama-server -hf ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF:Q8_0 --port 8012 -ngl 99 -fa -ub 1024 -b 1024 -dt 0.1 --ctx-size 0 --cache-reuse 256`
If the file is not available (first time) it will be downloaded (this could take some time) and after that llama.cpp server will be started.
Used for
- Chat with AI
- Chat with AI with project context
- Edit with AI
- Generage commit message
LLM type
- Chat Models
Instructions
Same like code completion server, but use chat model and a little bit different parameters.
CPU-only:
`llama-server -hf ggml-org/Qwen2.5-Coder-1.5B-Instruct-Q8_0-GGUF --port 8011 -np 2`
With Nvidia GPUs and installed cuda drivers
- more than 16GB VRAM
`llama-server -hf ggml-org/Qwen2.5-Coder-7B-Instruct-Q8_0-GGUF --port 8011 -np 2`
- less than 16GB VRAM
`llama-server -hf ggml-org/Qwen2.5-Coder-3B-Instruct-Q8_0-GGUF --port 8011 -np 2`
- less than 8GB VRAM
`llama-server -hf ggml-org/Qwen2.5-Coder-1.5B-Instruct-Q8_0-GGUF --port 8011 -np 2`
Used for
- Chat with AI with project context
LLM type
- Embedding
Instructions
Same like code completion server, but use embeddings model and a little bit different parameters.
`llama-server -hf ggml-org/Nomic-Embed-Text-V2-GGUF --port 8010 -ub 2048 -b 2048 --ctx-size 2048 --embeddings`