Skip to content

Tuning API in Katib for LLMs #2291

@andreyvelich

Description

@andreyvelich

Recently, we implemented a new train Python SDK API in Kubeflow Training Operator to easily fine-tune LLMs on multiple GPUs with predefined datasets provider, model provider, and HuggingFace trainer.

To continue our roadmap around LLMOps in Kubeflow, we want to give user functionality to tune HyperParameters of LLMs using simple Python SDK APIs: tune.
It requires to make appropriate changes to the Katib Python SDK which allows users to set model, dataset, and HyperParameters that they want to optimize for LLM.
We need to re-use existing Training Operator components that we used for train API: storage-initializer, trainer.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions