Mtmd: add a way to select device for vision encoder #14236

stduhpf · 2025-06-17T10:20:58Z

Adds a way to chose which backend device to use for Clip/vision encoder by setting the MTMD_BACKEND_DEVICE env variable to the device name.

stduhpf · 2025-06-26T09:59:24Z

@ngxson Can you take a look at this? I think it's very easy to review and can be useful in some situations.

hjc4869 · 2025-06-26T10:05:21Z

It's indeed a useful feature. In my configuration I have two GPUs running different layers of the model with -ot and the current implementation use both of them to run vision encoder. In my use case it would be nice if it is possible to pin it to a specific device.

stduhpf · 2025-07-22T09:32:17Z

Can any of the maintainers take a look at this?

ggerganov

It would be better to pass the device as a parameter instead of using an env variable. Any reason to not introduce a new parameter?

ngxson · 2025-07-22T09:42:21Z

tools/mtmd/clip.cpp

-        backend = ctx_params.use_gpu
-                    ? ggml_backend_init_by_type(GGML_BACKEND_DEVICE_TYPE_GPU, nullptr)
-                    : nullptr;
+        backend = nullptr;


I think this should be defined earlier:

ggml_backend_t backend = nullptr; ggml_backend_t backend_cpu = nullptr;

tools/mtmd/clip.cpp

ngxson · 2025-07-22T09:47:38Z

It would be better to pass the device as a parameter instead of using an env variable. Any reason to not introduce a new parameter?

Adding an arg can be a breaking change, because mtmd_context_params will be modified. But this is not a very big problem, so I have no preference either we should or should not add an arg.

For arg, maybe we can add --mmproj-device since all other multimodal-related arg are prefixed with --mmproj-*

stduhpf · 2025-07-22T09:52:54Z

Any reason to not introduce a new parameter?

No particular reason other than using a env variable is more straight-forward to implement.

* origin/master: (49 commits) ci : correct label refactor->refactoring (ggml-org#14832) CUDA: fix quantized KV cache + multiple sequences (ggml-org#14822) tests : add non-cont K,V FA tests memory : handle saving/loading null layers in recurrent memory (ggml-org#14675) ggml: fix loongarch quantize_row_q8_1 error (ggml-org#14827) CANN: weight format to NZ for Ascend310P3 (ggml-org#14407) CUDA: add fused rms norm (ggml-org#14800) ggml : model card yaml tab->2xspace (ggml-org#14819) vulkan: fix rms_norm_mul to handle broadcasting dim0 (ggml-org#14817) llama : add model type detection for rwkv7 7B&14B (ggml-org#14816) imatrix: add option to display importance score statistics for a given imatrix file (ggml-org#12718) Mtmd: add a way to select device for vision encoder (ggml-org#14236) cuda : implement bf16 cpy ops and enable bf16 cont (ggml-org#14763) opencl: remove unreachable `return` (ggml-org#14806) server : allow setting `--reverse-prompt` arg (ggml-org#14799) cuda: remove linking to cublasLt (ggml-org#14790) opencl: fix `im2col` when `KW!=KH` (ggml-org#14803) opencl: add conv2d kernel (ggml-org#14403) sycl: Fix im2col (ggml-org#14797) kleidiai: add support for get_rows (ggml-org#14676) ...

* Mtmd: add a way to select device for vision encoder * simplify * format * Warn user if manual device selection failed * initialize backend to nullptr

github-actions bot added the examples label Jun 17, 2025

stduhpf force-pushed the clip-gpu-select branch from a3023a0 to 954600a Compare June 26, 2025 10:03

stduhpf force-pushed the clip-gpu-select branch from 954600a to 205e4cb Compare June 26, 2025 12:13

hjc4869 added a commit to hjc4869/llama.cpp that referenced this pull request Jun 28, 2025

Cherry pick PR ggml-org#14236

2c6ed76

hjc4869 added a commit to hjc4869/llama.cpp that referenced this pull request Jul 5, 2025

Cherry pick PR ggml-org#14236

c5d7552

stduhpf force-pushed the clip-gpu-select branch from 205e4cb to 47e9237 Compare July 7, 2025 12:49

hjc4869 added a commit to hjc4869/llama.cpp that referenced this pull request Jul 8, 2025

Cherry pick PR ggml-org#14236

c4a2b65

hjc4869 added a commit to hjc4869/llama.cpp that referenced this pull request Jul 11, 2025

Cherry pick PR ggml-org#14236

37f3819

hjc4869 added a commit to hjc4869/llama.cpp that referenced this pull request Jul 16, 2025

Cherry pick PR ggml-org#14236

29b2ac1

hjc4869 added a commit to hjc4869/llama.cpp that referenced this pull request Jul 22, 2025

Cherry pick PR ggml-org#14236

aaeb8a9

ggerganov reviewed Jul 22, 2025

View reviewed changes

ngxson reviewed Jul 22, 2025

View reviewed changes

stduhpf added 3 commits July 22, 2025 11:50

Mtmd: add a way to select device for vision encoder

6617f3c

simplify

f9fb321

format

ffd0bd8

stduhpf added 2 commits July 22, 2025 12:08

Warn user if manual device selection failed

baf7e3e

initialize backend to nullptr

01e448c

stduhpf force-pushed the clip-gpu-select branch from 47e9237 to 01e448c Compare July 22, 2025 10:09

ngxson approved these changes Jul 22, 2025

View reviewed changes

ngxson merged commit c8ade30 into ggml-org:master Jul 22, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mtmd: add a way to select device for vision encoder #14236

Mtmd: add a way to select device for vision encoder #14236

Uh oh!

stduhpf commented Jun 17, 2025

Uh oh!

stduhpf commented Jun 26, 2025

Uh oh!

hjc4869 commented Jun 26, 2025

Uh oh!

stduhpf commented Jul 22, 2025

Uh oh!

ggerganov left a comment

Uh oh!

ngxson Jul 22, 2025

Uh oh!

Uh oh!

ngxson commented Jul 22, 2025

Uh oh!

stduhpf commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

Mtmd: add a way to select device for vision encoder #14236

Mtmd: add a way to select device for vision encoder #14236

Uh oh!

Conversation

stduhpf commented Jun 17, 2025

Uh oh!

stduhpf commented Jun 26, 2025

Uh oh!

hjc4869 commented Jun 26, 2025

Uh oh!

stduhpf commented Jul 22, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ngxson commented Jul 22, 2025

Uh oh!

stduhpf commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!