You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This feature request is mainly for frontends like Open WebUI, where users can't use profiles.
One possible way to implement this is "reverse" groups, where models defined in each group must be swapped, and models defined within different groups can be loaded together. In a multi-GPU system, a group can be defined for each GPU (or group of GPUs), and model configurations added to each group. If two models are in separate groups, then llama-swap could load them side-by-side and proxy both in parallel on any OpenAI compatible API client.