Tuning API in Katib for LLMs

Recently, we implemented [a new `train` Python SDK API](https://github.com/kubeflow/training-operator/blob/master/docs/proposals/train_api_proposal.md) in Kubeflow Training Operator to easily fine-tune LLMs on multiple GPUs with predefined datasets provider, model provider, and HuggingFace trainer.

To continue our roadmap around LLMOps in Kubeflow, we want to give user functionality to tune HyperParameters of LLMs using simple Python SDK APIs: `tune`.
It requires to make appropriate changes to the Katib Python SDK which allows users to set model, dataset, and HyperParameters that they want to optimize for LLM.
We need to re-use existing Training Operator components that we used for `train` API: `storage-initializer`, `trainer`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tuning API in Katib for LLMs #2291

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tuning API in Katib for LLMs #2291

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions