Introduce "Groups" Design

There have been several issues where introducing a grouping behaviour could address: 

- #99 
- #96 

This is sort of the other side of what llama-swap was originally designed for; to keep some models loaded and swap other others. Giving this some thinking time I believe adding groups could address the new requests. 

Some design requirements and constraints for building this feature: 

1. Do not break current configuration files (as much as possible)
2. Users are responsible for resolving configuration conflicts
3. Model IDs are globally unique 
4. Groups can have swapping disabled. Default is `swap: true`
5. Groups may be exclusive. They force other groups to unload. Default is `exclusive: true`
6. Groups may be persistent. Prevents other, exclusive groups from unloading the. Default is `persistent:false`

This is what the configuration would look like: 

```
# the models definitions are unchanged. 
models:  
    m1: 
        ... 
    m2: 
        ... 
    m3:
        ... 
    .
    . ... 

# introduction of a groups top level key: 
groups: 
    G1:
        swap: true
        exclusive: true
        members: 
            - m1
            - m2
    G2:
        swap: false
        exclusive: true
        members:
            - m3
            - m4
    G3:
        swap: true
        exclusive: false
        members:
            - m5
            - m6
            - m7
    G4:
        swap: false
        exclusive: false
        members:
            - m8
            - m9
    G5:
        swap: false
        exclusive: false
        persistent: true
        members:
            - m10
            - m11
```

In the above configuration: 

- `G1` will run `m1 OR m3`. It will cause other groups to unload. 
- `G2` will run `m2 AND m3`. It will cause other groups to unload. 
- `G3` will run `m4 OR m5 OR m6`. It will NOT affect other groups. 
- `G4` will run `m7 AND m8`. It will NOT affect other groups. 
- `G5` will run `m10 AND m11`. It will NOT affect other groups. It is NOT affected by other groups. This keeps a set of models always loaded. The only way to unload these models is to restart llama-swap or call the `/unload` endpoint.

## What about models that are not members of groups?

There is a default and hidden group that is essentially: 

```yaml
groups: 
    (default):
        swap: true
        exclusive: true
        members: [ all models not in a group ]
```

Setting `swap: true` and `exclusive: true` is the current behaviour of llama-swap, only one model runs at atime. 

## What about profiles?

With this, the `profiles` feature which has caused a lot of confusion can be removed. A profile was an attempt to keep multiple models loaded at the same time using a prefix. This could be possible using a `G2` or `G4` style group. I will break rule 1 however as the complex profile code will be removed. 

This setting: 

```yaml
profiles:
  coding:
      - qwen-coder-32B
      - qwen-coder-3090-FIM
```

is now replaced with: 

```yaml    
groups:
  coding:
    swap: false
    exclusive: true
    members:
      - qwen-coder-32B
      - qwen-coder-3090-FIM
```

There is no longer a need to prepend the profile name with the model. Models can be requested by just their identifier. 

## What is rule 5 about?

This exists because there is a sea of complexity and possible issues as users mix/match hardware, operating systems, inference servers, etc. llama-swap follows a unix foot gun philosophy. While the configuration is designed to be simple it can quickly grow in complexity. As complexity grows so does the expectation of users knowing what they are doing. It's also there to protect my own time and sanity. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce "Groups" Design #107

What about models that are not members of groups?

What about profiles?

What is rule 5 about?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Introduce "Groups" Design #107

Description

What about models that are not members of groups?

What about profiles?

What is rule 5 about?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions