Skip to content

Allow parallel inference #724

@petrm

Description

@petrm

Describe the feature you'd like

I would like the inference queue to allow parallel execution of jobs.

Describe the benefits this would bring to existing Hoarder users

The use case is availability of multiple load ballanced ollama backends that would speed up processing.

Can the goal of this request already be achieved via other means?

Not easily.

  • faster gpu with more ram
  • multiple gpus on the same host
  • something else than ollama that can run jobs distributed on multiple machines (like https://github.com/exo-explore/exo maybe?)

Have you searched for an existing open/closed issue?

  • I have searched for existing issues and none cover my fundamental request

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions