-
Notifications
You must be signed in to change notification settings - Fork 481
Closed
Labels
Description
Recently, we implemented a new train
Python SDK API in Kubeflow Training Operator to easily fine-tune LLMs on multiple GPUs with predefined datasets provider, model provider, and HuggingFace trainer.
To continue our roadmap around LLMOps in Kubeflow, we want to give user functionality to tune HyperParameters of LLMs using simple Python SDK APIs: tune
.
It requires to make appropriate changes to the Katib Python SDK which allows users to set model, dataset, and HyperParameters that they want to optimize for LLM.
We need to re-use existing Training Operator components that we used for train
API: storage-initializer
, trainer
.
tenzen-y, Electronic-Waste and helenxie-bit