docs: publish v0.5.1 docs #1289

chewong · 2025-07-21T19:00:59Z

Reason for Change:

docs: publish v0.5.1 docs

Opened #1288 as a follow-up task to automate this.

Requirements

added unit tests and e2e tests (if applicable).

Issue Fixed:

Notes for Reviewers:

Signed-off-by: Ernest Wong <chwong719@gmail.com>

kaito-pr-agent · 2025-07-21T19:01:54Z

Title

Publish v0.5.1 Documentation and New Features

Description

Added new documentation for version v0.5.1, including proposals, usage guides, and API descriptions.
Introduced a new client example for interacting with the RAGEngine API.
Provided detailed instructions for setting up and using distributed inference and tool calling.

Changes walkthrough 📝

Relevant files

Documentation

8 files

example_rag_client.py `Added example client for RAGEngine API`	+135/-0
20250704-keda-scaler-for-inference-workloads.md `Added proposal for KEDA scaler for inference workloads`	+842/-0
rag.md `Added documentation for RAGEngine CRD and API usage`	+428/-0
20250325-distributed-inference.md `Added proposal for distributed inference`	+259/-0
inference.md `Added documentation for inference with PEFT and Kubernetes job design`	+258/-0
20250609-model-as-oci-artifacts.md `Added proposal for distributing models as OCI artifacts`	+256/-0
tuning.md `Added documentation for fine-tuning with PEFT and Kubernetes job` `design`	+168/-0
tool-calling.md `Added documentation for tool calling with vLLM`	+219/-0

Additional files

27 files

aws.md	+105/-0
contributing.md	+100/-0
custom-model.md	+83/-0
faq.md	+59/-0
installation.md	+145/-0
intro.md	+67/-0
kaito-on-byo-gpu-nodes.md	+256/-0
kaito-oom-prevention.md	+49/-0
monitoring.md	+50/-0
multi-node-inference.md	+263/-0
preset-onboarding.md	+30/-0
presets.md	+32/-0
proposals.md	+56/-0
20240205-mistral-instruct.md	+51/-0
20240205-mistral.md	+50/-0
20240206-phi-2.md	+50/-0
20240527-phi3-instruct.md	+53/-0
20241212-phi4-instruct.md	+50/-0
20250103-qwen2.5-coder.md	+50/-0
20250529-llama-3.3-70b-instruct.md	+105/-0
20250611-workspace-subresource-scale-api.md	+174/-0
20250715-inference-aware-routing-layer.md	+268/-0
YYYYMMDD-model-template.md	+75/-0
quick-start.md	+105/-0
usage.md	+9/-0
version-v0.5.1-sidebars.json	+58/-0
versions.json	+1/-0

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

kaito-pr-agent · 2025-07-21T19:03:47Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category Suggestion Impact

Possible issue

Fix incorrect response type in error handling

Ensure that the GetMetrics function correctly handles errors and returns the
appropriate response type. The current code mistakenly returns an IsActiveResponse
when an error occurs while fetching the workspace.

website/versioned_docs/version-v0.5.1/example_rag_client.py [52-97]

 def get_metrics(ctx, req *pb.GetMetricsRequest) (*pb.GetMetricsResponse, error) {
     # Parse scaler metadata from the request
     scalerConfig, err := k.parseScalerMetadata(req.ScaledObjectRef, req.MetricName)
     if err != nil {
         return nil, fmt.Errorf("failed to parse scaler metadata: %w", err)
     }
 
     # Get target Kaito Workspace
     var workspace kaito.Workspace
-    err := k.Get(ctx, req.Namespace/Name, &workspace)
+    err = k.Get(ctx, req.Namespace+"/"+req.Name, &workspace)
     if err != nil {
-        return &pb.IsActiveResponse{Result: false}, nil
+        return nil, fmt.Errorf("failed to get workspace: %w", err)
     }
 
     # Get inference pods for this workspace
     pods, err := k.getInferencePods(ctx, workspace)
     if err != nil {
         return nil, fmt.Errorf("failed to get inference pods: %w", err)
     }
 
     # Collect metrics from all pods with intelligent fallback for missing metrics
     var totalValue float64
     var podCount int
     var readyPodCount int
     var missingMetricCount int
 
     for _, pod := range pods {
         podCount++
         if k.isPodReady(pod) {
             readyPodCount++
             value, err := k.getMetricFromPod(ctx, pod, scalerConfig)
             if err != nil {
                 # Pod is ready but metric is unavailable (e.g., endpoint not responding)
                 missingMetricCount++
                 continue
             }
             totalValue += float64(value)
         } else {
             # Pod is not ready (pending, starting, etc.)
             missingMetricCount++
         }
     }
 
     # Apply fallback for missing metrics to prevent flapping
     if missingMetricCount > 0 {
         fallbackValue := k.calculateFallbackValue(scalerConfig, totalValue, readyPodCount-missingMetricCount)
         totalValue += fallbackValue * float64(missingMetricCount)
     }
 
     # Calculate average metric value per pod (including fallback values)
     var avgValue float64
     if podCount > 0 {
         avgValue = totalValue / float64(podCount)
     }
 
     return &pb.GetMetricsResponse{
         MetricValues: []*pb.MetricValue{
             {
                 MetricName:  req.MetricName,
                 MetricValue: int64(avgValue),
             },
         },
     }, nil
 }

Suggestion importance[1-10]: 8

__

Why: The GetMetrics function incorrectly returns an IsActiveResponse when an error occurs while fetching the workspace. It should return a GetMetricsResponse with an error instead.

Medium

General

Verify kubectl installation before running Azure CLI command

Consider verifying that kubectl is not already installed before running the Azure
CLI command to install it. This prevents unnecessary installations and potential
conflicts.

website/versioned_docs/version-v0.5.1/installation.md [38]

 ```bash
-az aks install-cli
+if ! command -v kubectl &> /dev/null; then
+    az aks install-cli
+fi

<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: Verifying the installation of `kubectl` before attempting to install it via Azure CLI prevents unnecessary installations and potential conflicts, improving the setup process.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Use consistent date formats</summary>

___


**Use consistent date formats (e.g., YYYY-MM-DD) for all dates in the history section <br>to maintain uniformity and readability.**

[website/versioned_docs/version-v0.5.1/proposals/YYYYMMDD-model-template.md [73-75]](https://github.com/kaito-project/kaito/pull/1289/files#diff-44b219ce5adde375f280da250cfba675ad016ed32dd3e76706fd48a9592556c9R73-R75)

```diff
-- [ ] MM/DD/YYYY: Open proposal PR.
-- [ ] MM/DD/YYYY: Start model integration.
-- [ ] MM/DD/YYYY: Complete model support.
+- [ ] YYYY-MM-DD: Open proposal PR.
+- [ ] YYYY-MM-DD: Start model integration.
+- [ ] YYYY-MM-DD: Complete model support.

Suggestion importance[1-10]: 6

__

Why: Using consistent date formats (e.g., YYYY-MM-DD) improves the readability and uniformity of the history section, making it easier to track the progress of the proposal.

Low

Signed-off-by: Ernest Wong <chwong719@gmail.com>

docs: publish v0.5.1 docs

7c20fd1

Signed-off-by: Ernest Wong <chwong719@gmail.com>

chewong requested review from Fei-Guo and zhuangqh as code owners July 21, 2025 19:01

github-project-automation bot added this to KAITO Roadmap Jul 21, 2025

chewong had a problem deploying to e2e-test July 21, 2025 19:01 — with GitHub Actions Error

chewong had a problem deploying to unit-tests July 21, 2025 19:01 — with GitHub Actions Error

chewong temporarily deployed to unit-tests July 21, 2025 19:01 — with GitHub Actions Inactive

Fei-Guo approved these changes Jul 21, 2025

View reviewed changes

update links

1e51699

Signed-off-by: Ernest Wong <chwong719@gmail.com>

chewong temporarily deployed to unit-tests July 21, 2025 19:11 — with GitHub Actions Inactive

chewong had a problem deploying to e2e-test July 21, 2025 19:11 — with GitHub Actions Error

chewong had a problem deploying to unit-tests July 21, 2025 19:11 — with GitHub Actions Error

update tool-calling.md

7f6e58e

Signed-off-by: Ernest Wong <chwong719@gmail.com>

chewong temporarily deployed to unit-tests July 21, 2025 19:19 — with GitHub Actions Inactive

chewong had a problem deploying to e2e-test July 21, 2025 19:19 — with GitHub Actions Failure

chewong temporarily deployed to unit-tests July 21, 2025 19:19 — with GitHub Actions Inactive

chewong merged commit fca7291 into kaito-project:main Jul 21, 2025
13 of 14 checks passed

github-project-automation bot moved this to Done in KAITO Roadmap Jul 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: publish v0.5.1 docs #1289

docs: publish v0.5.1 docs #1289

Uh oh!

chewong commented Jul 21, 2025

Uh oh!

kaito-pr-agent bot commented Jul 21, 2025

Uh oh!

kaito-pr-agent bot commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

docs: publish v0.5.1 docs #1289

docs: publish v0.5.1 docs #1289

Uh oh!

Conversation

chewong commented Jul 21, 2025

Uh oh!

kaito-pr-agent bot commented Jul 21, 2025

Title

Description

Changes walkthrough 📝

Uh oh!

kaito-pr-agent bot commented Jul 21, 2025

PR Code Suggestions ✨

Uh oh!

Uh oh!

Uh oh!