-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Mtmd: add a way to select device for vision encoder #14236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ngxson Can you take a look at this? I think it's very easy to review and can be useful in some situations. |
It's indeed a useful feature. In my configuration I have two GPUs running different layers of the model with |
Can any of the maintainers take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to pass the device as a parameter instead of using an env variable. Any reason to not introduce a new parameter?
tools/mtmd/clip.cpp
Outdated
backend = ctx_params.use_gpu | ||
? ggml_backend_init_by_type(GGML_BACKEND_DEVICE_TYPE_GPU, nullptr) | ||
: nullptr; | ||
backend = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be defined earlier:
ggml_backend_t backend = nullptr;
ggml_backend_t backend_cpu = nullptr;
Adding an arg can be a breaking change, because For arg, maybe we can add |
No particular reason other than using a env variable is more straight-forward to implement. |
* origin/master: (49 commits) ci : correct label refactor->refactoring (ggml-org#14832) CUDA: fix quantized KV cache + multiple sequences (ggml-org#14822) tests : add non-cont K,V FA tests memory : handle saving/loading null layers in recurrent memory (ggml-org#14675) ggml: fix loongarch quantize_row_q8_1 error (ggml-org#14827) CANN: weight format to NZ for Ascend310P3 (ggml-org#14407) CUDA: add fused rms norm (ggml-org#14800) ggml : model card yaml tab->2xspace (ggml-org#14819) vulkan: fix rms_norm_mul to handle broadcasting dim0 (ggml-org#14817) llama : add model type detection for rwkv7 7B&14B (ggml-org#14816) imatrix: add option to display importance score statistics for a given imatrix file (ggml-org#12718) Mtmd: add a way to select device for vision encoder (ggml-org#14236) cuda : implement bf16 cpy ops and enable bf16 cont (ggml-org#14763) opencl: remove unreachable `return` (ggml-org#14806) server : allow setting `--reverse-prompt` arg (ggml-org#14799) cuda: remove linking to cublasLt (ggml-org#14790) opencl: fix `im2col` when `KW!=KH` (ggml-org#14803) opencl: add conv2d kernel (ggml-org#14403) sycl: Fix im2col (ggml-org#14797) kleidiai: add support for get_rows (ggml-org#14676) ...
* Mtmd: add a way to select device for vision encoder * simplify * format * Warn user if manual device selection failed * initialize backend to nullptr
Adds a way to chose which backend device to use for Clip/vision encoder by setting the
MTMD_BACKEND_DEVICE
env variable to the device name.