GitHub - kubeflow-kale/kale: Kubeflow’s superfood for Data Scientists

KALE (Kubeflow Automated pipeLines Engine) is a project that aims at simplifying the Data Science experience of deploying Kubeflow Pipelines workflows.

Kubeflow is a great platform for orchestrating complex workflows on top Kubernetes and Kubeflow Pipeline provides the mean to create reusable components that can be executed as part of workflows. The self-service nature of Kubeflow make it extremely appealing for Data Science use, at it provides an easy access to advanced distributed jobs orchestration, re-usability of components, Jupyter Notebooks, rich UIs and more. Still, developing and maintaining Kubeflow workflows can be hard for data scientists, who may not be experts in working orchestration platforms and related SDKs. Additionally, data science often involve processes of data exploration, iterative modelling and interactive environments (mostly Jupyter notebook).

Kale bridges this gap by providing a simple UI to define Kubeflow Pipelines workflows directly from you JupyterLab interface, without the need to change a single line of code.

Read more about Kale and how it works in this Medium post: Automating Jupyter Notebook Deployments to Kubeflow Pipelines with Kale

Getting started

Requirements

Install Kubeflow Pipelines(v2.4.0) as recommended in the official documentation Kubeflow Pipelines Installation
Kubernetes

Install the Kale backend from PyPI and the JupyterLab extension. You can find a set of curated Notebooks in the examples repository

# install kale
pip install kubeflow-kale

# install jupyter lab
pip install "jupyterlab>=4.0.0"

# install the extension
jupyter labextension install kubeflow-kale-labextension
# verify extension status
jupyter labextension list

# run
jupyter lab

To build images to be used as a NotebookServer in Kubeflow, refer to the Dockerfile in the docker folder.

FAQ

Head over to FAQ to read about some known issues and some of the limitations imposed by the Kale data marshalling model.

Resources

Kale introduction blog post
Codelabs showcasing Kale working in MiniKF with Arrikto's Rok:
- From Notebook to Kubeflow Pipelines
- From Notebook to Kubeflow Pipelines with HP Tuning
KubeCon NA Tutorial 2019: From Notebook to Kubeflow Pipelines: An End-to-End Data Science Workflow / video
CNCF Webinar 2020: From Notebook to Kubeflow Pipelines with MiniKF & Kale / video
KubeCon EU Tutorial 2020: From Notebook to Kubeflow Pipelines with HP Tuning: A Data Science Journey / video

Contribute

Backend

Make sure you have installed Kubeflow Pipelines(v2.4.0) as recommended in the official documentation Kubeflow Pipelines Installation

Clone the repository and create a conda environment:

git clone https://github.com/kubeflow-kale/kale.git
cd kale
conda create --name my_project_env python=3.10
conda activate my_project_env

Checkout to v2.0-dev branch. Then:

git checkout v2.0-dev
cd backend/
pip install -e .[dev]

#start kfp locally in another terminal (optional)
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

#run cli from outside the backend directory
cd ..
python ./backend/kale/cli.py --nb ./examples/base/candies_sharing.ipynb --kfp_host http://127.0.0.1:8080/ --run_pipeline

# run tests
pytest -x -vv # TODO

DSL script will be generated inside .kale of root directory and pipeline would be visible in KFP UI running at 'http://127.0.0.1:8080/'.

Notes to consider:

Component names can't be same as any other variable with same name being used in the user code.
Component name can't have _ and spaces, but instead have '-'
Component names can't have capital letters and numbers after a '-'.
Step names shouldn't have capital letters and no numbers after '-', eg. 'kid1' is fine, but not 'kid-1'.
Step names with _ are replaced to '-' for component names and appended with '-step' in the DSL script.
Artifact variables are appended with '-artifact' in the DSL script.

Labextension

The JupyterLab Python package comes with its own yarn wrapper, called jlpm. While using the previously installed venv, install JupyterLab by running:

pip install "jupyterlab>=4.0.0"

You can then run the following to install the Kale extension:

cd labextension/

# install dependencies from package.lock
jlpm install
# build extension
jlpm run build

# list installed jp extensions
jlpm labextension list
# install Kale extension
jlpm labextension install .

# for development:
# build and watch
jlpm run watch

# in another shell, run JupyterLab in watch mode
jupyter lab --no-browser --watch

Git Hooks

This repository uses husky to set up git hooks.

For husky to function properly, you need to have yarn installed and in your PATH. The reason that is required is that husky is installed via jlpm install and jlpm is a yarn wrapper. (Similarly, if it was installed using the npm package manager, then npm would have to be in PATH.)

Currently installed git hooks:

pre-commit: Run a prettier check on staged files, using pretty-quick

Name		Name	Last commit message	Last commit date
Latest commit History 646 Commits
.github/workflows		.github/workflows
backend		backend
docker		docker
docs/imgs		docs/imgs
examples		examples
labextension		labextension
.editorconfig		.editorconfig
.gitignore		.gitignore
AUTHORS		AUTHORS
FAQ.md		FAQ.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Getting started

Requirements

FAQ

Resources

Contribute

Backend

Notes to consider:

Labextension

Git Hooks

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

License

kubeflow-kale/kale

Folders and files

Latest commit

History

Repository files navigation

Getting started

Requirements

FAQ

Resources

Contribute

Backend

Notes to consider:

Labextension

Git Hooks

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages