Skip to content

Conversation

zhuangqh
Copy link
Collaborator

@zhuangqh zhuangqh commented Jul 22, 2025

Reason for Change:

fix a bug: only attach secret volume when specifying in spec.

Requirements

  • added unit tests and e2e tests (if applicable).

Issue Fixed:

Notes for Reviewers:

Copy link

Title

Adopt Generator Pattern for Fine-Tuning and Enhance Testing


Description

  • Adopted generator pattern for fine-tuning part

  • Added support for generating tuning commands dynamically

  • Enhanced testing for adapter puller and model download info

  • Refactored tuning job manifest generation


Changes walkthrough 📝

Relevant files
Enhancement
7 files
interface.go
Added default tuning main file and command generation       
+13/-0   
common-preset.go
Added function to find results volume mount                           
+11/-0   
generator.go
Updated manifest type to include Job                                         
+2/-1     
workspace_controller.go
Updated CreatePresetTuning to use model parameter               
+2/-2     
preset-inferences.go
Added error handling for ModelAccessSecret and refactored
SetModelDownloadInfo
+3/-1     
manifests.go
Refactored GenerateTuningJobManifest and added SetJobPodSpec
+30/-61 
preset-tuning.go
Refactored CreatePresetTuning and related functions           
+169/-163
Tests
2 files
preset-inferences_test.go
Added comprehensive tests for SetAdapterPuller and
SetModelDownloadInfo
+431/-0 
preset-tuning_test.go
Added tests for SetTrainingResultVolume and PrepareTrainingOutput
+432/-114

Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Possible Issue

    The CreatePresetTuning function does not handle the case where utils.GetGPUConfigBySKU returns an error and utils.TryGetGPUConfigFromNode also fails. In this scenario, skuNumGPUs is set to the default value from the model, but the error from TryGetGPUConfigFromNode is ignored.

    gpuConfig, err := utils.GetGPUConfigBySKU(workspaceObj.Resource.InstanceType)
    if err != nil {
    	gpuConfig, err = utils.TryGetGPUConfigFromNode(ctx, kubeClient, workspaceObj.Status.WorkerNodes)
    	if err != nil {
    		defaultNumGPU := resource.MustParse(model.GetTuningParameters().GPUCountRequirement)
    		skuNumGPUs = int(defaultNumGPU.Value())
    	}
    }
    Redundant Code

    The GenerateBasicInferencePodSpec function sets up the tolerations variable, but it is never used within the function. This could be a leftover from previous code or an oversight.

    func GenerateBasicInferencePodSpec(gpuConfig *sku.GPUConfig, skuNumGPUs int) func(*generator.WorkspaceGeneratorContext, *corev1.PodSpec) error {

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Define containerPorts before usage

    Ensure containerPorts is defined before usage to avoid runtime panics.

    pkg/workspace/tuning/preset-tuning.go [254]

    +containerPorts := []corev1.ContainerPort{
    +    {
    +        ContainerPort: PortInferenceServer,
    +        Protocol:      corev1.ProtocolTCP,
    +    },
    +}
     spec.Tolerations = tolerations
     spec.InitContainers = append(spec.InitContainers, initContainers...)
     spec.Containers = []corev1.Container{
         {
             Name:         ctx.Workspace.Name,
             Image:        GetTuningImageInfo(),
             Command:      commands,
             Resources:    resourceRequirements,
             Ports:        containerPorts,
             Env:          envVars,
             VolumeMounts: volumeMounts,
         },
     }
     spec.Volumes = volumes
    Suggestion importance[1-10]: 8

    __

    Why: Defining containerPorts before usage prevents runtime panics and ensures the code is more robust and error-free.

    Medium
    Handle errors in GetTuningCommand

    Ensure tuningParam.GetTuningCommand handles potential errors gracefully.

    pkg/workspace/tuning/preset-tuning.go [167]

    -commands := tuningParam.GetTuningCommand(pkgmodel.RuntimeContext{
    +commands, err := tuningParam.GetTuningCommand(pkgmodel.RuntimeContext{
         SKUNumGPUs: skuNumGPUs,
     })
    +if err != nil {
    +    return nil, fmt.Errorf("failed to get tuning command: %w", err)
    +}
    Suggestion importance[1-10]: 7

    __

    Why: Handling errors in GetTuningCommand improves robustness, but it is not a critical issue and does not affect the overall functionality significantly.

    Medium
    General
    Validate output.Image format

    Validate output.Image format to ensure it is a valid image URI.

    pkg/workspace/tuning/preset-tuning.go [314]

     if output.Volume != nil || output.Image == "" {
         return nil
     }
    +if _, err := url.ParseRequestURI(output.Image); err != nil {
    +    return fmt.Errorf("invalid image URI: %w", err)
    +}
    Suggestion importance[1-10]: 6

    __

    Why: Validating output.Image format ensures that the image URI is correct, which is important for the functionality but not a critical issue.

    Low
    Make DefaultTuningMainFile configurable

    Consider making DefaultTuningMainFile configurable via environment variable or
    configuration file to allow flexibility.

    pkg/model/interface.go [57]

    -DefaultTuningMainFile = "/workspace/tfs/fine_tuning.py"
    +DefaultTuningMainFile := os.Getenv("DEFAULT_TUNING_MAIN_FILE", "/workspace/tfs/fine_tuning.py")
    Suggestion importance[1-10]: 5

    __

    Why: Making DefaultTuningMainFile configurable can increase flexibility, but it is not a critical change and does not address any immediate issues.

    Low

    Copy link
    Contributor

    @Copilot Copilot AI left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Pull Request Overview

    This PR refactors the fine-tuning component to adopt a generator pattern for building Kubernetes manifests, improving code organization and maintainability. The changes fix a bug where secret volumes were incorrectly attached without proper specification in the spec.

    Key changes:

    • Migrated from direct manifest creation to a composable generator pattern using modular functions
    • Separated concerns into discrete volume mount and container setup functions
    • Fixed secret volume attachment to only occur when explicitly specified

    Reviewed Changes

    Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

    Show a summary per file
    File Description
    pkg/workspace/tuning/preset-tuning.go Core refactoring to use generator pattern for pod spec creation
    pkg/workspace/tuning/preset-tuning_test.go Updated tests for new modular functions and removed obsolete test cases
    pkg/workspace/manifests/manifests.go Refactored job manifest generation to use generator pattern
    pkg/workspace/controllers/workspace_controller.go Updated function calls to match new signature
    pkg/utils/generator/generator.go Added support for batch Job type in generator
    pkg/model/interface.go Added tuning command generation method
    pkg/utils/common-preset.go Added utility function to find results volume mount
    pkg/workspace/inference/preset-inferences.go Added validation for model access secret
    pkg/workspace/inference/preset-inferences_test.go Added comprehensive tests for new inference functions
    test/e2e/preset_test.go Added image pull secrets to test configuration
    Comments suppressed due to low confidence (2)

    pkg/workspace/tuning/preset-tuning_test.go:254

    • [nitpick] This line modifies the expectedInitContainer after its creation, which could make the test harder to understand. Consider building the expectedInitContainer with the correct VolumeMounts from the start for better test clarity.
    	expectedInitContainer.VolumeMounts = expectedVolumeMounts
    

    pkg/workspace/tuning/preset-tuning.go:237

    • The variable name 'tuningParam' is misleading as it calls GetInferenceParameters() but is used for tuning commands. Consider renaming to 'inferenceParam' or 'modelParam' to better reflect its source.
    		tuningParam := ctx.Model.GetInferenceParameters().DeepCopy()
    

    @@ -340,7 +340,8 @@ func updatePhi3TuningWorkspaceWithPresetPublicMode(workspaceObj *kaitov1beta1.Wo
    }
    } else {
    workspaceObj.Tuning.Input = &kaitov1beta1.DataSource{
    Image: datasetImageName,
    Image: datasetImageName,
    ImagePullSecrets: []string{e2eACRSecret},
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    @Fei-Guo
    need attention here. We indeed need to pass the secret to let puller pull image from a private registry. The new code strengthens that.

    zhuangqh added 3 commits July 28, 2025 16:52
    Signed-off-by: zhuangqh <zhuangqhc@gmail.com>
    Signed-off-by: zhuangqh <zhuangqhc@gmail.com>
    Signed-off-by: zhuangqh <zhuangqhc@gmail.com>
    @zhuangqh zhuangqh merged commit d2ac059 into kaito-project:main Jul 29, 2025
    12 of 13 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    Status: Done
    Development

    Successfully merging this pull request may close these issues.

    3 participants