Skip to content

docs: publish v0.5.1 docs #1289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 21, 2025
Merged

docs: publish v0.5.1 docs #1289

merged 3 commits into from
Jul 21, 2025

Conversation

chewong
Copy link
Collaborator

@chewong chewong commented Jul 21, 2025

Reason for Change:

docs: publish v0.5.1 docs

Opened #1288 as a follow-up task to automate this.

Requirements

  • added unit tests and e2e tests (if applicable).

Issue Fixed:

Notes for Reviewers:

Signed-off-by: Ernest Wong <chwong719@gmail.com>
Copy link

Title

Publish v0.5.1 Documentation and New Features


Description

  • Added new documentation for version v0.5.1, including proposals, usage guides, and API descriptions.

  • Introduced a new client example for interacting with the RAGEngine API.

  • Provided detailed instructions for setting up and using distributed inference and tool calling.


Changes walkthrough 📝

Relevant files
Documentation
8 files
example_rag_client.py
Added example client for RAGEngine API                                     
+135/-0 
20250704-keda-scaler-for-inference-workloads.md
Added proposal for KEDA scaler for inference workloads     
+842/-0 
rag.md
Added documentation for RAGEngine CRD and API usage           
+428/-0 
20250325-distributed-inference.md
Added proposal for distributed inference                                 
+259/-0 
inference.md
Added documentation for inference with PEFT and Kubernetes job design
+258/-0 
20250609-model-as-oci-artifacts.md
Added proposal for distributing models as OCI artifacts   
+256/-0 
tuning.md
Added documentation for fine-tuning with PEFT and Kubernetes job
design
+168/-0 
tool-calling.md
Added documentation for tool calling with vLLM                     
+219/-0 
Additional files
27 files
aws.md +105/-0 
contributing.md +100/-0 
custom-model.md +83/-0   
faq.md +59/-0   
installation.md +145/-0 
intro.md +67/-0   
kaito-on-byo-gpu-nodes.md +256/-0 
kaito-oom-prevention.md +49/-0   
monitoring.md +50/-0   
multi-node-inference.md +263/-0 
preset-onboarding.md +30/-0   
presets.md +32/-0   
proposals.md +56/-0   
20240205-mistral-instruct.md +51/-0   
20240205-mistral.md +50/-0   
20240206-phi-2.md +50/-0   
20240527-phi3-instruct.md +53/-0   
20241212-phi4-instruct.md +50/-0   
20250103-qwen2.5-coder.md +50/-0   
20250529-llama-3.3-70b-instruct.md +105/-0 
20250611-workspace-subresource-scale-api.md +174/-0 
20250715-inference-aware-routing-layer.md +268/-0 
YYYYMMDD-model-template.md +75/-0   
quick-start.md +105/-0 
usage.md +9/-0     
version-v0.5.1-sidebars.json +58/-0   
versions.json +1/-0     

Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix incorrect response type in error handling

    Ensure that the GetMetrics function correctly handles errors and returns the
    appropriate response type. The current code mistakenly returns an IsActiveResponse
    when an error occurs while fetching the workspace.

    website/versioned_docs/version-v0.5.1/example_rag_client.py [52-97]

     def get_metrics(ctx, req *pb.GetMetricsRequest) (*pb.GetMetricsResponse, error) {
         # Parse scaler metadata from the request
         scalerConfig, err := k.parseScalerMetadata(req.ScaledObjectRef, req.MetricName)
         if err != nil {
             return nil, fmt.Errorf("failed to parse scaler metadata: %w", err)
         }
     
         # Get target Kaito Workspace
         var workspace kaito.Workspace
    -    err := k.Get(ctx, req.Namespace/Name, &workspace)
    +    err = k.Get(ctx, req.Namespace+"/"+req.Name, &workspace)
         if err != nil {
    -        return &pb.IsActiveResponse{Result: false}, nil
    +        return nil, fmt.Errorf("failed to get workspace: %w", err)
         }
     
         # Get inference pods for this workspace
         pods, err := k.getInferencePods(ctx, workspace)
         if err != nil {
             return nil, fmt.Errorf("failed to get inference pods: %w", err)
         }
     
         # Collect metrics from all pods with intelligent fallback for missing metrics
         var totalValue float64
         var podCount int
         var readyPodCount int
         var missingMetricCount int
     
         for _, pod := range pods {
             podCount++
             if k.isPodReady(pod) {
                 readyPodCount++
                 value, err := k.getMetricFromPod(ctx, pod, scalerConfig)
                 if err != nil {
                     # Pod is ready but metric is unavailable (e.g., endpoint not responding)
                     missingMetricCount++
                     continue
                 }
                 totalValue += float64(value)
             } else {
                 # Pod is not ready (pending, starting, etc.)
                 missingMetricCount++
             }
         }
     
         # Apply fallback for missing metrics to prevent flapping
         if missingMetricCount > 0 {
             fallbackValue := k.calculateFallbackValue(scalerConfig, totalValue, readyPodCount-missingMetricCount)
             totalValue += fallbackValue * float64(missingMetricCount)
         }
     
         # Calculate average metric value per pod (including fallback values)
         var avgValue float64
         if podCount > 0 {
             avgValue = totalValue / float64(podCount)
         }
     
         return &pb.GetMetricsResponse{
             MetricValues: []*pb.MetricValue{
                 {
                     MetricName:  req.MetricName,
                     MetricValue: int64(avgValue),
                 },
             },
         }, nil
     }
    Suggestion importance[1-10]: 8

    __

    Why: The GetMetrics function incorrectly returns an IsActiveResponse when an error occurs while fetching the workspace. It should return a GetMetricsResponse with an error instead.

    Medium
    General
    Verify kubectl installation before running Azure CLI command

    Consider verifying that kubectl is not already installed before running the Azure
    CLI command to install it. This prevents unnecessary installations and potential
    conflicts.

    website/versioned_docs/version-v0.5.1/installation.md [38]

     ```bash
    -az aks install-cli
    +if ! command -v kubectl &> /dev/null; then
    +    az aks install-cli
    +fi
    <details><summary>Suggestion importance[1-10]: 7</summary>
    
    __
    
    Why: Verifying the installation of `kubectl` before attempting to install it via Azure CLI prevents unnecessary installations and potential conflicts, improving the setup process.
    
    
    </details></details></td><td align=center>Medium
    
    </td></tr><tr><td>
    
    
    
    <details><summary>Use consistent date formats</summary>
    
    ___
    
    
    **Use consistent date formats (e.g., YYYY-MM-DD) for all dates in the history section <br>to maintain uniformity and readability.**
    
    [website/versioned_docs/version-v0.5.1/proposals/YYYYMMDD-model-template.md [73-75]](https://github.com/kaito-project/kaito/pull/1289/files#diff-44b219ce5adde375f280da250cfba675ad016ed32dd3e76706fd48a9592556c9R73-R75)
    
    ```diff
    -- [ ] MM/DD/YYYY: Open proposal PR.
    -- [ ] MM/DD/YYYY: Start model integration.
    -- [ ] MM/DD/YYYY: Complete model support.
    +- [ ] YYYY-MM-DD: Open proposal PR.
    +- [ ] YYYY-MM-DD: Start model integration.
    +- [ ] YYYY-MM-DD: Complete model support.
    
    Suggestion importance[1-10]: 6

    __

    Why: Using consistent date formats (e.g., YYYY-MM-DD) improves the readability and uniformity of the history section, making it easier to track the progress of the proposal.

    Low

    Signed-off-by: Ernest Wong <chwong719@gmail.com>
    Signed-off-by: Ernest Wong <chwong719@gmail.com>
    @chewong chewong merged commit fca7291 into kaito-project:main Jul 21, 2025
    13 of 14 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    Status: Done
    Development

    Successfully merging this pull request may close these issues.

    2 participants