refactor: adopt generator pattern in fine-tuning part #1292

zhuangqh · 2025-07-22T07:05:51Z

Reason for Change:

fix a bug: only attach secret volume when specifying in spec.

Requirements

added unit tests and e2e tests (if applicable).

Issue Fixed:

Notes for Reviewers:

kaito-pr-agent · 2025-07-22T07:06:29Z

Title

Adopt Generator Pattern for Fine-Tuning and Enhance Testing

Description

Adopted generator pattern for fine-tuning part
Added support for generating tuning commands dynamically
Enhanced testing for adapter puller and model download info
Refactored tuning job manifest generation

Changes walkthrough 📝

Relevant files

Enhancement

7 files

interface.go `Added default tuning main file and command generation`	+13/-0
common-preset.go `Added function to find results volume mount`	+11/-0
generator.go `Updated manifest type to include Job`	+2/-1
workspace_controller.go `Updated CreatePresetTuning to use model parameter`	+2/-2
preset-inferences.go `Added error handling for ModelAccessSecret and refactored` `SetModelDownloadInfo`	+3/-1
manifests.go `Refactored GenerateTuningJobManifest and added SetJobPodSpec`	+30/-61
preset-tuning.go `Refactored CreatePresetTuning and related functions`	+169/-163

Tests

2 files

preset-inferences_test.go `Added comprehensive tests for SetAdapterPuller and` `SetModelDownloadInfo`	+431/-0
preset-tuning_test.go `Added tests for SetTrainingResultVolume and PrepareTrainingOutput`	+432/-114

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

kaito-pr-agent · 2025-07-22T07:06:55Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue The `CreatePresetTuning` function does not handle the case where `utils.GetGPUConfigBySKU` returns an error and `utils.TryGetGPUConfigFromNode` also fails. In this scenario, `skuNumGPUs` is set to the default value from the model, but the error from `TryGetGPUConfigFromNode` is ignored. gpuConfig, err := utils.GetGPUConfigBySKU(workspaceObj.Resource.InstanceType) if err != nil { gpuConfig, err = utils.TryGetGPUConfigFromNode(ctx, kubeClient, workspaceObj.Status.WorkerNodes) if err != nil { defaultNumGPU := resource.MustParse(model.GetTuningParameters().GPUCountRequirement) skuNumGPUs = int(defaultNumGPU.Value()) } } Redundant Code The `GenerateBasicInferencePodSpec` function sets up the `tolerations` variable, but it is never used within the function. This could be a leftover from previous code or an oversight. func GenerateBasicInferencePodSpec(gpuConfig sku.GPUConfig, skuNumGPUs int) func(generator.WorkspaceGeneratorContext, *corev1.PodSpec) error {

kaito-pr-agent · 2025-07-22T07:08:09Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Define `containerPorts` before usage Ensure `containerPorts` is defined before usage to avoid runtime panics. pkg/workspace/tuning/preset-tuning.go [254] +containerPorts := []corev1.ContainerPort{ + { + ContainerPort: PortInferenceServer, + Protocol: corev1.ProtocolTCP, + }, +} spec.Tolerations = tolerations spec.InitContainers = append(spec.InitContainers, initContainers...) spec.Containers = []corev1.Container{ { Name: ctx.Workspace.Name, Image: GetTuningImageInfo(), Command: commands, Resources: resourceRequirements, Ports: containerPorts, Env: envVars, VolumeMounts: volumeMounts, }, } spec.Volumes = volumes Suggestion importance[1-10]: 8 __ Why: Defining `containerPorts` before usage prevents runtime panics and ensures the code is more robust and error-free.	Medium
Possible issue	Handle errors in `GetTuningCommand` Ensure `tuningParam.GetTuningCommand` handles potential errors gracefully. pkg/workspace/tuning/preset-tuning.go [167] -commands := tuningParam.GetTuningCommand(pkgmodel.RuntimeContext{ +commands, err := tuningParam.GetTuningCommand(pkgmodel.RuntimeContext{ SKUNumGPUs: skuNumGPUs, }) +if err != nil { + return nil, fmt.Errorf("failed to get tuning command: %w", err) +} Suggestion importance[1-10]: 7 __ Why: Handling errors in `GetTuningCommand` improves robustness, but it is not a critical issue and does not affect the overall functionality significantly.	Medium
General	Validate `output.Image` format Validate `output.Image` format to ensure it is a valid image URI. pkg/workspace/tuning/preset-tuning.go [314] if output.Volume != nil \|\| output.Image == "" { return nil } +if _, err := url.ParseRequestURI(output.Image); err != nil { + return fmt.Errorf("invalid image URI: %w", err) +} Suggestion importance[1-10]: 6 __ Why: Validating `output.Image` format ensures that the image URI is correct, which is important for the functionality but not a critical issue.	Low
General	Make `DefaultTuningMainFile` configurable Consider making `DefaultTuningMainFile` configurable via environment variable or configuration file to allow flexibility. pkg/model/interface.go [57] -DefaultTuningMainFile = "/workspace/tfs/fine_tuning.py" +DefaultTuningMainFile := os.Getenv("DEFAULT_TUNING_MAIN_FILE", "/workspace/tfs/fine_tuning.py") Suggestion importance[1-10]: 5 __ Why: Making `DefaultTuningMainFile` configurable can increase flexibility, but it is not a critical change and does not address any immediate issues.	Low

pkg/workspace/tuning/preset-tuning.go

Copilot

Pull Request Overview

This PR refactors the fine-tuning component to adopt a generator pattern for building Kubernetes manifests, improving code organization and maintainability. The changes fix a bug where secret volumes were incorrectly attached without proper specification in the spec.

Key changes:

Migrated from direct manifest creation to a composable generator pattern using modular functions
Separated concerns into discrete volume mount and container setup functions
Fixed secret volume attachment to only occur when explicitly specified

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`pkg/workspace/tuning/preset-tuning.go`	Core refactoring to use generator pattern for pod spec creation
`pkg/workspace/tuning/preset-tuning_test.go`	Updated tests for new modular functions and removed obsolete test cases
`pkg/workspace/manifests/manifests.go`	Refactored job manifest generation to use generator pattern
`pkg/workspace/controllers/workspace_controller.go`	Updated function calls to match new signature
`pkg/utils/generator/generator.go`	Added support for batch Job type in generator
`pkg/model/interface.go`	Added tuning command generation method
`pkg/utils/common-preset.go`	Added utility function to find results volume mount
`pkg/workspace/inference/preset-inferences.go`	Added validation for model access secret
`pkg/workspace/inference/preset-inferences_test.go`	Added comprehensive tests for new inference functions
`test/e2e/preset_test.go`	Added image pull secrets to test configuration

Comments suppressed due to low confidence (2)

pkg/workspace/tuning/preset-tuning_test.go:254

[nitpick] This line modifies the expectedInitContainer after its creation, which could make the test harder to understand. Consider building the expectedInitContainer with the correct VolumeMounts from the start for better test clarity.

	expectedInitContainer.VolumeMounts = expectedVolumeMounts

pkg/workspace/tuning/preset-tuning.go:237

The variable name 'tuningParam' is misleading as it calls GetInferenceParameters() but is used for tuning commands. Consider renaming to 'inferenceParam' or 'modelParam' to better reflect its source.

		tuningParam := ctx.Model.GetInferenceParameters().DeepCopy()

pkg/workspace/manifests/manifests.go

zhuangqh · 2025-07-28T06:04:15Z

test/e2e/preset_test.go

@@ -340,7 +340,8 @@ func updatePhi3TuningWorkspaceWithPresetPublicMode(workspaceObj *kaitov1beta1.Wo
 			}
 		} else {
 			workspaceObj.Tuning.Input = &kaitov1beta1.DataSource{
-				Image: datasetImageName,
+				Image:            datasetImageName,
+				ImagePullSecrets: []string{e2eACRSecret},


@Fei-Guo
need attention here. We indeed need to pass the secret to let puller pull image from a private registry. The new code strengthens that.

pkg/workspace/inference/preset-inferences.go

Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

zhuangqh requested a review from Fei-Guo as a code owner July 22, 2025 07:05

github-project-automation bot added this to KAITO Roadmap Jul 22, 2025

zhuangqh requested a review from chewong as a code owner July 22, 2025 07:05

zhuangqh had a problem deploying to e2e-test July 22, 2025 07:05 — with GitHub Actions Error

zhuangqh had a problem deploying to unit-tests July 22, 2025 07:05 — with GitHub Actions Error

zhuangqh temporarily deployed to unit-tests July 22, 2025 07:05 — with GitHub Actions Inactive

kaito-pr-agent bot added the Review effort 4/5 label Jul 22, 2025

zhuangqh temporarily deployed to unit-tests July 22, 2025 07:12 — with GitHub Actions Inactive

zhuangqh had a problem deploying to e2e-test July 22, 2025 07:12 — with GitHub Actions Failure

zhuangqh temporarily deployed to unit-tests July 22, 2025 07:12 — with GitHub Actions Inactive

zhuangqh had a problem deploying to e2e-test July 22, 2025 10:07 — with GitHub Actions Failure

zhuangqh had a problem deploying to e2e-test July 22, 2025 12:40 — with GitHub Actions Failure

zhuangqh temporarily deployed to unit-tests July 22, 2025 12:40 — with GitHub Actions Inactive

chewong approved these changes Jul 22, 2025

View reviewed changes

pkg/workspace/tuning/preset-tuning.go Outdated Show resolved Hide resolved

zhuangqh temporarily deployed to unit-tests July 22, 2025 22:34 — with GitHub Actions Inactive

zhuangqh had a problem deploying to e2e-test July 22, 2025 22:34 — with GitHub Actions Failure

zhuangqh had a problem deploying to e2e-test July 22, 2025 23:55 — with GitHub Actions Failure

zhuangqh had a problem deploying to e2e-test July 23, 2025 00:47 — with GitHub Actions Error

zhuangqh had a problem deploying to e2e-test July 23, 2025 04:17 — with GitHub Actions Error

zhuangqh force-pushed the refactor-manifest2 branch from 43aeb62 to 49cdb69 Compare July 23, 2025 05:37

zhuangqh had a problem deploying to e2e-test July 23, 2025 05:37 — with GitHub Actions Error

zhuangqh temporarily deployed to unit-tests July 23, 2025 05:37 — with GitHub Actions Inactive

zhuangqh had a problem deploying to e2e-test July 23, 2025 15:57 — with GitHub Actions Failure

zhuangqh temporarily deployed to unit-tests July 25, 2025 05:53 — with GitHub Actions Inactive

zhuangqh temporarily deployed to unit-tests July 28, 2025 01:42 — with GitHub Actions Inactive

zhuangqh temporarily deployed to e2e-test July 28, 2025 01:42 — with GitHub Actions Inactive

zhuangqh force-pushed the refactor-manifest2 branch from b101acc to 73f5e00 Compare July 28, 2025 02:10

zhuangqh temporarily deployed to unit-tests July 28, 2025 02:10 — with GitHub Actions Inactive

zhuangqh had a problem deploying to e2e-test July 28, 2025 02:10 — with GitHub Actions Failure

zhuangqh temporarily deployed to unit-tests July 28, 2025 02:10 — with GitHub Actions Inactive

zhuangqh had a problem deploying to e2e-test July 28, 2025 02:10 — with GitHub Actions Error

zhuangqh had a problem deploying to e2e-test July 28, 2025 03:18 — with GitHub Actions Failure

zhuangqh temporarily deployed to e2e-test July 28, 2025 04:04 — with GitHub Actions Inactive

Fei-Guo requested a review from Copilot July 28, 2025 05:55

Copilot AI reviewed Jul 28, 2025

View reviewed changes

pkg/workspace/manifests/manifests.go Show resolved Hide resolved

zhuangqh commented Jul 28, 2025

View reviewed changes

Fei-Guo reviewed Jul 28, 2025

View reviewed changes

pkg/workspace/inference/preset-inferences.go Outdated Show resolved Hide resolved

zhuangqh requested a review from chewong July 28, 2025 06:46

zhuangqh had a problem deploying to unit-tests July 28, 2025 06:51 — with GitHub Actions Error

zhuangqh had a problem deploying to e2e-test July 28, 2025 06:51 — with GitHub Actions Error

zhuangqh added 3 commits July 28, 2025 16:52

refactor: adopt generator pattern in fine-tuning part

e511c59

Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

fix

f2e0ac2

Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

update

bcaf124

Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

zhuangqh force-pushed the refactor-manifest2 branch from 3662931 to bcaf124 Compare July 28, 2025 06:52

zhuangqh temporarily deployed to e2e-test July 28, 2025 06:52 — with GitHub Actions Inactive

zhuangqh had a problem deploying to e2e-test July 28, 2025 06:52 — with GitHub Actions Failure

zhuangqh temporarily deployed to unit-tests July 28, 2025 06:52 — with GitHub Actions Inactive

chewong approved these changes Jul 29, 2025

View reviewed changes

zhuangqh merged commit d2ac059 into kaito-project:main Jul 29, 2025
12 of 13 checks passed

github-project-automation bot moved this to Done in KAITO Roadmap Jul 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: adopt generator pattern in fine-tuning part #1292

refactor: adopt generator pattern in fine-tuning part #1292

Uh oh!

zhuangqh commented Jul 22, 2025 •

edited

Loading

Uh oh!

kaito-pr-agent bot commented Jul 22, 2025

Uh oh!

kaito-pr-agent bot commented Jul 22, 2025

Uh oh!

kaito-pr-agent bot commented Jul 22, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

zhuangqh Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

refactor: adopt generator pattern in fine-tuning part #1292

refactor: adopt generator pattern in fine-tuning part #1292

Uh oh!

Conversation

zhuangqh commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaito-pr-agent bot commented Jul 22, 2025

Title

Description

Changes walkthrough 📝

Uh oh!

kaito-pr-agent bot commented Jul 22, 2025

PR Reviewer Guide 🔍

Uh oh!

kaito-pr-agent bot commented Jul 22, 2025

PR Code Suggestions ✨

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

zhuangqh Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhuangqh commented Jul 22, 2025 •

edited

Loading