Vulkan support (replacing pull/5059) #9650

grinco · 2025-03-11T13:18:23Z

This pull request is based on #5059, and whyvl#7

Tested on the v0.5.13 on linux. Image was built using the supplied Dockerfile with a caveat that release image was bumped to 24.04 (from 20.04).

Build command:

docker buildx build --platform linux/amd64 ${OLLAMA_COMMON_BUILD_ARGS} -t grinco/ollama-amd-apu:vulkan .

Tested on AMD Ryzen 7 8845HS w/ Radeon 780M Graphics with ROCm disabled

[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-03-11T13:00:40.793Z level=INFO source=gpu.go:199 msg="vulkan: load libvulkan and libcap ok"
time=2025-03-11T13:00:40.877Z level=INFO source=gpu.go:421 msg="error looking up vulkan GPU memory" error="device is a CPU"
time=2025-03-11T13:00:40.878Z level=WARN source=amd_linux.go:443 msg="amdgpu detected, but no compatible rocm library found.  Either install rocm v6, or follow manual install instructions at https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install"
time=2025-03-11T13:00:40.878Z level=WARN source=amd_linux.go:348 msg="unable to verify rocm library: no suitable rocm found, falling back to CPU"
time=2025-03-11T13:00:40.879Z level=INFO source=types.go:137 msg="inference compute" id=0 library=vulkan variant="" compute=1.3 driver=1.3 name="AMD Radeon Graphics (RADV GFX1103_R1)" total="15.6 GiB" available="15.6 GiB"

 # ollama run phi4:14b
>>> /set verbose
Set 'verbose' mode.
>>> how's it going?
Hello! I'm here to help you with any questions or tasks you have. How can I assist you today? 😊

total duration:       3.341959745s
load duration:        18.165612ms
prompt eval count:    15 token(s)
prompt eval duration: 475ms
prompt eval rate:     31.58 tokens/s
eval count:           26 token(s)
eval duration:        2.846s
eval rate:            9.14 tokens/s
>>>

# Conflicts: # gpu/gpu.go

# Conflicts: # gpu/gpu_linux.go

Making amd gpu work on arm achitecture with vulkan

Fix variable name

- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).

- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: 31a866457d350d17de839986c105312bcf8eb0e6).

juls0730 · 2025-04-29T16:20:45Z

I cant seem to add an issue to the repo itself, so I will report my findings here.

I'm running an rx580, when I increase the context window to around 20k is when it starts, I get complete garbage responses, with mistral-nemo, I get- uh, whatever this is

mistral-nemo nonesense

<SPECIAL_24>[AVAILABLE_TOOLS][MIDDLE]<SPECIAL_26><SPECIAL_15>[SUFFIX]<SPECIAL_32>[A<SPECIAL_24>[AVAILABLE_TOOLS][MIDDLE]<SPECIAL_26><SPECIAL_15>[SUFFIX]<SPECIAL_32>[AVAILABLE_TOOLS][TOOL_RESULTS]<SPECIAL_20><SPECIAL_16><SPECIAL_20><SPECIAL_25><SPECIAAILABLE_TOOLS][TOOL_RESULTS]<SPECIAL_20><SPECIAL_16><SPECIAL_20><SPECIAL_25><SPECIAL_27><SPECIAL_22><SPECIAL_38><SPECIAL_17><SPECIAL_25><unk><SPECIAL_25><SPECIAL_33>[S_27><SPECIAL_22><SPECIAL_38><SPECIAL_17><SPECIAL_25><unk><SPECIAL_25><SPECIAL_33>[SUFFIX]<unk><SPECIAL_39><SPECIAL_19>[SUFFIX]

And when running llama3.2:3b and llama3.2:3b-instruct-q4_K_S I just get the letter "G" repeated a bunch. I'm running ollama with the launch instructions provided on your docker image for this PR:

docker run -d --restart=always \
        -v ollama:/root/.ollama \
        -p 11434:11434 --name ollama_vulkan-grinco \
        -v /opt/amdgpu/:/opt/amdgpu/ \
        --device /dev/kfd --device /dev/dri \
        --cap-add CAP_PERFMON \
        -e OLLAMA_FLASH_ATTENTION=1 \
        -e OLLAMA_KV_CACHE_TYPE="q8_0" \
        docker.io/grinco/ollama-amd-apu:vulkan

With a context below 20k it seems fine. I'm also able to use ROCm by following this just fine #2453 (comment) (albeit, slower than with vulkan, but at least its stable lmao)

SergeyFilippov · 2025-05-06T12:08:31Z

I cant seem to add an issue to the repo itself, so I will report my findings here.

I'm running an rx580, when I increase the context window to around 20k is when it starts, I get complete garbage responses, with mistral-nemo, I get- uh, whatever this is

mistral-nemo nonesense
And when running llama3.2:3b and llama3.2:3b-instruct-q4_K_S I just get the letter "G" repeated a bunch. I'm running ollama with the launch instructions provided on your docker image for this PR:
docker run -d --restart=always \
        -v ollama:/root/.ollama \
        -p 11434:11434 --name ollama_vulkan-grinco \
        -v /opt/amdgpu/:/opt/amdgpu/ \
        --device /dev/kfd --device /dev/dri \
        --cap-add CAP_PERFMON \
        -e OLLAMA_FLASH_ATTENTION=1 \
        -e OLLAMA_KV_CACHE_TYPE="q8_0" \
        docker.io/grinco/ollama-amd-apu:vulkan
With a context below 20k it seems fine. I'm also able to use ROCm by following this just fine #2453 (comment) (albeit, slower than with vulkan, but at least its stable lmao)

As far as I've understood it is currently not recommended to use flash attention and context quantization while running Vulkan. There are multiple issues with it in underlying llama.cpp.

juls0730 · 2025-05-06T16:05:56Z

I cant seem to add an issue to the repo itself, so I will report my findings here.
I'm running an rx580, when I increase the context window to around 20k is when it starts, I get complete garbage responses, with mistral-nemo, I get- uh, whatever this is
mistral-nemo nonesense
And when running llama3.2:3b and llama3.2:3b-instruct-q4_K_S I just get the letter "G" repeated a bunch. I'm running ollama with the launch instructions provided on your docker image for this PR:
docker run -d --restart=always \
        -v ollama:/root/.ollama \
        -p 11434:11434 --name ollama_vulkan-grinco \
        -v /opt/amdgpu/:/opt/amdgpu/ \
        --device /dev/kfd --device /dev/dri \
        --cap-add CAP_PERFMON \
        -e OLLAMA_FLASH_ATTENTION=1 \
        -e OLLAMA_KV_CACHE_TYPE="q8_0" \
        docker.io/grinco/ollama-amd-apu:vulkan
With a context below 20k it seems fine. I'm also able to use ROCm by following this just fine #2453 (comment) (albeit, slower than with vulkan, but at least its stable lmao)
As far as I've understood it is currently not recommended to use flash attention and context quantization while running Vulkan. There are multiple issues with it in underlying llama.cpp.

~~IIRC I tried running the model without KV quantitization and flash attention, and it still had issues, but I will try again and make sure.~~ That seemingly fixes the issue, thanks @SergeyFilippov!

juls0730 · 2025-05-08T13:39:01Z

This vulkan PR is pretty nice, loading up models with a big context window seems faster and the time to first token is much shorter. There is also a nice performance uplift compared to my previous solution (#2453 (comment)). I hope this gets more attention eventually.

SergeyFilippov · 2025-05-10T11:53:49Z

Hi, @jmorganca,
I know you guys have a lot of work to do and defined priorities, but speaking on behalf on a part of community who don't have CUDA or ROCm available, Vulkan is a groundbreaking improvement to LLM performance (as owner of rx 9070 series, where there are no ROCm but 16 gigs of vram and lot's of flops).

I do understand, that ollama's goal is simplicity for a user, and maintaining multiple backends is a burden, but it would be still nice to know:

if there is a chance of getting official vulkan support in future?
if so, what can we do or improve to make it happen?
what would it take to just hide this backend behind some experimental flag and provide "as-is" llama.cpp vulkan engine?

Thanks in advance.

P.S. it might make sense to pin the answere/statement about Vulkan somewhere, so those questions won't get asked over and over again.

machiav3lli · 2025-05-15T08:38:36Z

I'd guess a major release like 0.7.0 would be a good point to integrate the community work on such feature (and for the community to push for this when necessary).

virajwad · 2025-07-01T21:47:40Z

Is this PR in current state buildable on Windows?

If I try cmake -B build -DGGML_VULKAN=OFF I can build, but if I try cmake -B build -DGGML_VULKAN=ON, then I get the following error:

Or maybe I could get feedback please if I'm building wrong?

chilman408 · 2025-07-20T03:30:28Z

I installed this as an Docker App on my Unraid. This implementation is better than the Rocm version for my AMD 780m APU because I tried the Rocm version and it would work for a few minutes and then the GPU crashes. This version using Vulkan doesn't run into those "gpu hang" issues.

However, I'm getting errors when installing the latest DeepSeek or Qwen3 models where it says I need to update Ollama to use these models...

juls0730 · 2025-07-23T05:47:21Z

However, I'm getting errors when installing the latest DeepSeek or Qwen3 models where it says I need to update Ollama to use these models...

I believe is because the PR is based on an old version of ollama, I believe there is no way for you to fix this unless you update the patch to work with the latest version of ollama, which I think would be quite time consuming.

grinco · 2025-07-23T08:58:50Z

However, I'm getting errors when installing the latest DeepSeek or Qwen3 models where it says I need to update Ollama to use these models...

I believe is because the PR is based on an old version of ollama, I believe there is no way for you to fix this unless you update the patch to work with the latest version of ollama, which I think would be quite time consuming.

Yes, that's the reason. This PR, and its and its predecessor were submitted quite a while ago (4 months and 1 year respectively), and keeping up with the main branch to fix breaking changes not knowing if it will ever be merged is not something that I'll be investing time into.

yeahdongcn · 2025-08-07T06:08:29Z

Any chance to sync with the upstream code for both ollama and llama.cpp? Thanks.

grinco · 2025-08-07T08:02:29Z

There are multiple conflicts that need to be resolved - some of them - in the go code - which is going to require someone familiar with the matter to have a look at. The codebase drifted apart from the vulkan fork far too much for me to be able to address it without considerable amount of effort understanding the codebase. Maybe someone more knowledgeable can contribute. I see some conversation going on on whyvl#7 (comment) - however, it also seems to be stuck at an older version (v0.9.3) - I personally don't see any value in putting more work into this until it merges - or someone will create a vulkan fork they will maintain.

yeahdongcn · 2025-08-15T00:52:45Z

I’ve added Vulkan backend support in https://github.com/MooreThreads/ollama-musa (as I'm maintaining an Ollama fork for supporting MooreThreads GPU), which is based on Ollama v0.11.4.

The latest multi-arch (amd64 and arm64) Docker image for the Vulkan backend is docker.io/mthreads/ollama:0.11.4-vulkan. I’ve tested it on MTGPU, and it works well.

I’d like to test this on a virtual machine with AMDGPU (via AMD Developer Cloud), but I couldn’t find the Vulkan ICD on that machine. I noticed you’ve tested it on AMDGPU, so I’m wondering if you could share any instructions or tips.

chilman408 · 2025-08-15T01:52:04Z

I’ve added Vulkan backend support in https://github.com/MooreThreads/ollama-musa (as I'm maintaining an Ollama fork for supporting MooreThreads GPU), which is based on Ollama v0.11.4.

The latest multi-arch (amd64 and arm64) Docker image for the Vulkan backend is docker.io/mthreads/ollama:0.11.4-vulkan. I’ve tested it on MTGPU, and it works well.

I’d like to test this on a virtual machine with AMDGPU (via AMD Developer Cloud), but I couldn’t find the Vulkan ICD on that machine. I noticed you’ve tested it on AMDGPU, so I’m wondering if you could share any instructions or tips.

Oh for sure, let me check it out!

yeahdongcn · 2025-08-15T02:19:38Z

I’d like to test this on a virtual machine with AMDGPU (via AMD Developer Cloud), but I couldn’t find the Vulkan ICD on that machine. I noticed you’ve tested it on AMDGPU, so I’m wondering if you could share any instructions or tips.

I have installed the vulkan-amdgpu, and it turns out the icd file is now available.

vulkan-amdgpu/noble,now 25.10-2165406.24.04 amd64 [installed]
  AMDGPU Vulkan driver

But running vulkaninfo results following error:

root@0-4-35-gpu-mi300x1-192gb-devcloud-atl1:~# vulkaninfo
WARNING: [Loader Message] Code 0 : terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /opt/amdgpu/lib/x86_64-linux-gnu/amdvlk64.so. Skipping this driver.
'DISPLAY' environment variable not set... skipping surface info
radv/amdgpu: Failed to allocate a buffer:
radv/amdgpu:    size      : 0 bytes
radv/amdgpu:    alignment : 0 bytes
radv/amdgpu:    domains   : 2
Segmentation fault (core dumped)

yeahdongcn · 2025-08-20T01:55:55Z

I just pushed the latest tag mthreads/ollama:0.11.5-vulkan (repo: https://github.com/MooreThreads/ollama-musa), which is based on the Ollama v0.11.5 code.

This has been tested on MooreThreads MTT S80, Intel Arc A770, and AMD 780M.
MooreThreads#22
whyvl#26

whyvl and others added 30 commits June 14, 2024 19:56

implement the vulkan C backend

f46b4a6

add support in gpu.go

9c6b049

add support in gen_linux.sh

93c4d69

it builds

24c8840

fix segfault

724fac4

fix compilation

e4e8a5d

fix free memory monitor

257364c

fix total memory monitor

11c55fa

Merge branch 'refs/heads/main' into vulkan

e77ea68

# Conflicts: # gpu/gpu.go

update gpu.go

18f3f96

fix build

38466f1

fix check_perfmon len

e3f9ca4

remove cap_get_bound check

b958cd2

fix vulkan handle releasing

b6554e9

Merge remote-tracking branch 'upstream/main' into vulkan

7fe16ea

Merge branch 'main' of https://github.com/ollama/ollama into vulkan

022b921

Merge branch 'main' into vulkan

5f1a301

# Conflicts: # gpu/gpu_linux.go

fix build on federa 40

ace3d10

fix vulkan on windows

e61c329

fix conflict

9ad63a7

making amdgpu work on arm achitecutre with vulkan

4b74cee

add x86_64 lines in VulkanGlobs and capLinuxGlobs

6d7579b

Merge remote-tracking branch 'upstream/vulkan' into vulkan

9ac01e8

add aarch64 lines in vulkanGlobs and capLinuxGlobs

2bf59a5

Merge pull request #3 from yeongbba/vulkan

481ab07

Making amd gpu work on arm achitecture with vulkan

Merge branch 'ollama:main' into vulkan

f7e40b5

Fix variable name

0d277d3

Merge pull request #4 from tomaThomas/vulkan

3839e8f

Fix variable name

Merge github.com:ollama/ollama into vulkan

582d41e

Add vulkan build patch from @jmorganca

2d443b3

MingcongBai mentioned this pull request Apr 16, 2025

ollama: new, 0.6.5 AOSC-Dev/aosc-os-abbs#10502

Open

6 tasks

MingcongBai added a commit to AOSC-Dev/aosc-os-abbs that referenced this pull request Apr 16, 2025

ollama: new, 0.6.5

745ab65

- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).

MingcongBai added a commit to AOSC-Dev/aosc-os-abbs that referenced this pull request Apr 16, 2025

ollama: new, 0.6.5

454abe8

- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).

MingcongBai added a commit to AOSC-Dev/aosc-os-abbs that referenced this pull request Apr 16, 2025

ollama: new, 0.6.5

eccccec

- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).

MingcongBai added a commit to AOSC-Dev/aosc-os-abbs that referenced this pull request Apr 16, 2025

ollama: new, 0.6.5

7ca7de8

- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).

MingcongBai added a commit to AOSC-Dev/aosc-os-abbs that referenced this pull request Apr 16, 2025

ollama: new, 0.6.5

69fd26c

- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).

MingcongBai added a commit to AOSC-Dev/aosc-os-abbs that referenced this pull request Apr 16, 2025

ollama: new, 0.6.5

7e4a4c3

- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).

MingcongBai added a commit to AOSC-Dev/aosc-os-abbs that referenced this pull request Apr 16, 2025

ollama: new, 0.6.5

178cf18

- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: d75a173b8618a2ce35287663ffc6f75779e7b265).

MingcongBai added a commit to AOSC-Dev/aosc-os-abbs that referenced this pull request Apr 16, 2025

ollama: new, 0.6.5

cbcc20d

- Backport Vulkan backend support from ollama/ollama#9650. - Track patches at AOSC-Tracking/ollama @ aosc/v0.6.5 (HEAD: 31a866457d350d17de839986c105312bcf8eb0e6).

This comment was marked as outdated.

Sign in to view

grinco mentioned this pull request Jun 6, 2025

Add Vulkan support to ollama #5059

Open

rick-github mentioned this pull request Jun 30, 2025

Add Vulkan GPU Backend for AMD/Intel Support #11247

Closed

inforithmics mentioned this pull request Aug 9, 2025

Vulkan based on #9650 #11835

Draft

24 tasks

oscar370 mentioned this pull request Aug 10, 2025

Add GPU Support for AMD/Intel (Vulkan) – The Only Thing Missing Jeffser/Alpaca#845

Open

Vulkan support (replacing pull/5059) #9650

Are you sure you want to change the base?

Vulkan support (replacing pull/5059) #9650

Conversation

grinco commented Mar 11, 2025

Uh oh!

This comment was marked as outdated.

juls0730 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SergeyFilippov commented May 6, 2025

Uh oh!

juls0730 commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juls0730 commented May 8, 2025

Uh oh!

SergeyFilippov commented May 10, 2025

Uh oh!

machiav3lli commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

virajwad commented Jul 1, 2025

Uh oh!

chilman408 commented Jul 20, 2025

Uh oh!

juls0730 commented Jul 23, 2025

Uh oh!

grinco commented Jul 23, 2025

Uh oh!

yeahdongcn commented Aug 7, 2025

Uh oh!

grinco commented Aug 7, 2025

Uh oh!

yeahdongcn commented Aug 15, 2025

Uh oh!

chilman408 commented Aug 15, 2025

Uh oh!

yeahdongcn commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yeahdongcn commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

juls0730 commented Apr 29, 2025 •

edited

Loading

juls0730 commented May 6, 2025 •

edited

Loading

machiav3lli commented May 15, 2025 •

edited

Loading

yeahdongcn commented Aug 15, 2025 •

edited

Loading

yeahdongcn commented Aug 20, 2025 •

edited

Loading