-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Vulkan support (replacing pull/5059) #9650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # gpu/gpu.go
# Conflicts: # gpu/gpu_linux.go
Making amd gpu work on arm achitecture with vulkan
Fix variable name
- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).
- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).
- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).
- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).
- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).
- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).
- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).
- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: 31a866457d350d17de839986c105312bcf8eb0e6).
This comment was marked as outdated.
This comment was marked as outdated.
I cant seem to add an issue to the repo itself, so I will report my findings here. I'm running an rx580, when I increase the context window to around 20k is when it starts, I get complete garbage responses, with mistral-nemo, I get- uh, whatever this is mistral-nemo nonesense
And when running llama3.2:3b and llama3.2:3b-instruct-q4_K_S I just get the letter "G" repeated a bunch. I'm running ollama with the launch instructions provided on your docker image for this PR: docker run -d --restart=always \
-v ollama:/root/.ollama \
-p 11434:11434 --name ollama_vulkan-grinco \
-v /opt/amdgpu/:/opt/amdgpu/ \
--device /dev/kfd --device /dev/dri \
--cap-add CAP_PERFMON \
-e OLLAMA_FLASH_ATTENTION=1 \
-e OLLAMA_KV_CACHE_TYPE="q8_0" \
docker.io/grinco/ollama-amd-apu:vulkan With a context below 20k it seems fine. I'm also able to use ROCm by following this just fine #2453 (comment) (albeit, slower than with vulkan, but at least its stable lmao) |
As far as I've understood it is currently not recommended to use flash attention and context quantization while running Vulkan. There are multiple issues with it in underlying llama.cpp. |
|
This vulkan PR is pretty nice, loading up models with a big context window seems faster and the time to first token is much shorter. There is also a nice performance uplift compared to my previous solution (#2453 (comment)). I hope this gets more attention eventually. |
Hi, @jmorganca, I do understand, that ollama's goal is simplicity for a user, and maintaining multiple backends is a burden, but it would be still nice to know:
Thanks in advance. P.S. it might make sense to pin the answere/statement about Vulkan somewhere, so those questions won't get asked over and over again. |
I'd guess a major release like 0.7.0 would be a good point to integrate the community work on such feature (and for the community to push for this when necessary). |
I installed this as an Docker App on my Unraid. This implementation is better than the Rocm version for my AMD 780m APU because I tried the Rocm version and it would work for a few minutes and then the GPU crashes. This version using Vulkan doesn't run into those "gpu hang" issues. However, I'm getting errors when installing the latest DeepSeek or Qwen3 models where it says I need to update Ollama to use these models... |
I believe is because the PR is based on an old version of ollama, I believe there is no way for you to fix this unless you update the patch to work with the latest version of ollama, which I think would be quite time consuming. |
Yes, that's the reason. This PR, and its and its predecessor were submitted quite a while ago (4 months and 1 year respectively), and keeping up with the main branch to fix breaking changes not knowing if it will ever be merged is not something that I'll be investing time into. |
Any chance to sync with the upstream code for both ollama and llama.cpp? Thanks. |
There are multiple conflicts that need to be resolved - some of them - in the go code - which is going to require someone familiar with the matter to have a look at. The codebase drifted apart from the vulkan fork far too much for me to be able to address it without considerable amount of effort understanding the codebase. Maybe someone more knowledgeable can contribute. I see some conversation going on on whyvl#7 (comment) - however, it also seems to be stuck at an older version (v0.9.3) - I personally don't see any value in putting more work into this until it merges - or someone will create a vulkan fork they will maintain. |
I’ve added Vulkan backend support in https://github.com/MooreThreads/ollama-musa (as I'm maintaining an Ollama fork for supporting MooreThreads GPU), which is based on Ollama The latest multi-arch (amd64 and arm64) Docker image for the Vulkan backend is I’d like to test this on a virtual machine with AMDGPU (via AMD Developer Cloud), but I couldn’t find the Vulkan ICD on that machine. I noticed you’ve tested it on AMDGPU, so I’m wondering if you could share any instructions or tips. |
Oh for sure, let me check it out! |
I have installed the vulkan-amdgpu/noble,now 25.10-2165406.24.04 amd64 [installed]
AMDGPU Vulkan driver But running root@0-4-35-gpu-mi300x1-192gb-devcloud-atl1:~# vulkaninfo
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /opt/amdgpu/lib/x86_64-linux-gnu/amdvlk64.so. Skipping this driver.
'DISPLAY' environment variable not set... skipping surface info
radv/amdgpu: Failed to allocate a buffer:
radv/amdgpu: size : 0 bytes
radv/amdgpu: alignment : 0 bytes
radv/amdgpu: domains : 2
Segmentation fault (core dumped) |
I just pushed the latest tag This has been tested on MooreThreads MTT S80, Intel Arc A770, and AMD 780M. |
This pull request is based on #5059, and whyvl#7
Tested on the v0.5.13 on linux. Image was built using the supplied Dockerfile with a caveat that release image was bumped to 24.04 (from 20.04).
Build command:
Tested on AMD Ryzen 7 8845HS w/ Radeon 780M Graphics with ROCm disabled