refactor: introduce manifest generator -- part 1 #1284

zhuangqh · 2025-07-18T13:10:13Z

Reason for Change:

Introduce manifest generator to let it easier to be maintained and tested.

Signed-off-by: jerryzhuang <zhuangqhc@gmail.com>

kaito-pr-agent · 2025-07-18T13:10:46Z

Title

Introduce Manifest Generator and Refactor Inference Generation

Description

Introduced manifest generator to simplify maintenance and testing.
Refactored GeneratePresetInference to use the new generator.
Moved manifest generation logic into separate functions.
Removed redundant tests for removed functions.

Changes walkthrough 📝

Relevant files

Enhancement

generator.go `Add manifest generator utilities` pkg/utils/generator/generator.go Added `WorkspaceGeneratorContext` and `GeneratorContext` interfaces. Defined `ManifestType` and `TypedManifestModifier` types. Implemented `GenerateManifest` function.	+53/-0

Tests

generator_test.go `Add manifest generator tests` pkg/utils/generator/generator_test.go Added unit tests for `GenerateManifest`.	+61/-0
manifests_test.go `Remove outdated manifest tests` pkg/workspace/manifests/manifests_test.go Removed tests for old manifest generation functions.	+0/-239

Refactoring

preset-inferences.go `Refactor inference generation using manifest generator` pkg/workspace/inference/preset-inferences.go Updated `GetInferenceImageInfo` to return only image pull secrets. Refactored `GeneratePresetInference` to use the new manifest generator. Added new functions for generating pod specs and modifying them.	+219/-98
manifests.go `Refactor manifest generation using manifest generator` pkg/workspace/manifests/manifests.go Refactored `GenerateStatefulSetManifest` and `GenerateDeploymentManifest` to use the new manifest generator. Added new functions for setting pod specs and volume claim templates.	+57/-138

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

kaito-pr-agent · 2025-07-18T13:11:09Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Refactoring Complexity The refactoring introduces multiple new functions and modifies existing ones significantly. Ensure that the new functions are well-tested and that the logic is preserved correctly. // resource requirements var skuNumGPUs int // initially respect the user setting by deploying the model on the same number of nodes as the user requested numNodes := workspaceObj.Resource.Count gpuConfig, err := utils.GetGPUConfigBySKU(workspaceObj.Resource.InstanceType) if err != nil { gpuConfig, err = utils.TryGetGPUConfigFromNode(ctx, kubeClient, workspaceObj.Status.WorkerNodes) if err != nil { defaultNumGPU := resource.MustParse(model.GetInferenceParameters().GPUCountRequirement) skuNumGPUs = int(defaultNumGPU.Value()) } } if gpuConfig != nil { skuNumGPUs = gpuConfig.GPUCount // Calculate the minimum number of nodes required to satisfy the model's total GPU memory requirement. // The goal is to maximize GPU utilization and not spread the model across too many nodes. totalGPUMemoryRequired := resource.MustParse(model.GetInferenceParameters().TotalGPUMemoryRequirement) totalGPUMemoryPerNode := resource.NewQuantity(int64(gpuConfig.GPUMemGB)consts.GiBToBytes, resource.BinarySI) minimumNodes := 0 for ; totalGPUMemoryRequired.Sign() > 0; totalGPUMemoryRequired.Sub(totalGPUMemoryPerNode) { minimumNodes++ } if minimumNodes < numNodes { numNodes = minimumNodes } } gctx := &generator.WorkspaceGeneratorContext{ Ctx: ctx, KubeClient: kubeClient, Workspace: workspaceObj, Model: model, } podOpts := []generator.TypedManifestModifier[generator.WorkspaceGeneratorContext, corev1.PodSpec]{ GenerateInferencePodSpec(gpuConfig, skuNumGPUs, numNodes), SetModelDownloadInfo, SetAdapterPuller, } var depObj client.Object // For multi-node distributed inference with vLLM, we need to use a StatefulSet instead of a Deployment // to ensure pods are created with individual identities (their ordinal indexes) - // https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#pod-identity runtimeName := v1beta1.GetWorkspaceRuntimeName(workspaceObj) if model.SupportDistributedInference() && runtimeName == pkgmodel.RuntimeNameVLLM && numNodes > 1 { podOpts = append(podOpts, SetDistributedInferenceProbe) var ssOpts []generator.TypedManifestModifier[generator.WorkspaceGeneratorContext, appsv1.StatefulSet] if checkIfNVMeAvailable(ctx, gpuConfig, kubeClient) { ssOpts = append(ssOpts, manifests.AddStatefulSetVolumeClaimTemplates(GenerateModelWeightsCacheVolume(ctx, workspaceObj, model))) } else { podOpts = append(podOpts, SetDefaultModelWeightsVolume) } podSpec, err := generator.GenerateManifest(gctx, podOpts...) if err != nil { return nil, err } ssOpts = append(ssOpts, manifests.GenerateStatefulSetManifest(revisionNum, numNodes), manifests.SetStatefulSetPodSpec(podSpec), ) depObj, err = generator.GenerateManifest(gctx, ssOpts...) } else { podOpts = append(podOpts, SetDefaultModelWeightsVolume) podSpec, err := generator.GenerateManifest(gctx, podOpts...) if err != nil { return nil, err } depObj, err = generator.GenerateManifest(gctx, manifests.GenerateDeploymentManifest(revisionNum, numNodes), manifests.SetDeploymentPodSpec(podSpec), ) } Function Overload* Functions like `GenerateStatefulSetManifest` and `GenerateDeploymentManifest` now return a modifier function. Ensure that this change does not introduce any issues in how these manifests are generated and applied. func GenerateStatefulSetManifest(revisionNum string, replicas int) func(generator.WorkspaceGeneratorContext, appsv1.StatefulSet) error { return func(ctx generator.WorkspaceGeneratorContext, ss appsv1.StatefulSet) error { selector := map[string]string{ kaitov1beta1.LabelWorkspaceName: ctx.Workspace.Name, } labelselector := &v1.LabelSelector{ MatchLabels: selector, } ss.ObjectMeta = v1.ObjectMeta{ Name: ctx.Workspace.Name, Namespace: ctx.Workspace.Namespace, Annotations: map[string]string{ kaitov1beta1.WorkspaceRevisionAnnotation: revisionNum, }, OwnerReferences: []v1.OwnerReference{ v1.NewControllerRef(ctx.Workspace, kaitov1beta1.GroupVersion.WithKind("Workspace")), }, } ss.Spec = appsv1.StatefulSetSpec{ Replicas: lo.ToPtr(int32(replicas)), PodManagementPolicy: appsv1.ParallelPodManagement, PersistentVolumeClaimRetentionPolicy: &appsv1.StatefulSetPersistentVolumeClaimRetentionPolicy{ WhenScaled: appsv1.RetainPersistentVolumeClaimRetentionPolicyType, WhenDeleted: appsv1.DeletePersistentVolumeClaimRetentionPolicyType, }, Selector: labelselector, Template: corev1.PodTemplateSpec{ ObjectMeta: v1.ObjectMeta{ Labels: selector, }, }, } ss.Spec.ServiceName = fmt.Sprintf("%s-headless", ctx.Workspace.Name) return nil } Assumption Validation* The code assumes there is only one container in the pod (`// FIXME: assume only one container in the pod`). Validate this assumption and handle cases where there might be multiple containers. spec.Volumes = append(spec.Volumes, adapterVolume) for i := range spec.Containers { // FIXME: assume only one container in the pod spec.Containers[i].VolumeMounts = append(spec.Containers[i].VolumeMounts, adapterVolumeMount)

kaito-pr-agent · 2025-07-18T13:12:12Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Set correct service name Ensure the `GenerateStatefulSetManifest` function sets the `ServiceName` correctly. pkg/workspace/manifests/manifests.go [109] func GenerateStatefulSetManifest(revisionNum string, replicas int) func(generator.WorkspaceGeneratorContext, appsv1.StatefulSet) error { + return func(ctx generator.WorkspaceGeneratorContext, ss appsv1.StatefulSet) error { + // ... (rest of the original logic) + ss.Spec.ServiceName = fmt.Sprintf("%s-headless", ctx.Workspace.Name) + return nil + } +} Suggestion importance[1-10]: 7 __ Why: The suggestion ensures the `GenerateStatefulSetManifest` function sets the `ServiceName` correctly. This is a minor improvement that enhances the correctness of the function.	Medium

Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

chewong · 2025-07-21T21:29:44Z

pkg/utils/generator/generator.go

+
+type TypedManifestModifier[C GeneratorContext, T ManifestType] func(ctx *C, obj *T) error
+
+func GenerateManifest[C GeneratorContext, T ManifestType](ctx *C, modifiers ...TypedManifestModifier[C, T]) (*T, error) {


Great to see the use of generics 👍🏻

zhuangqh added 2 commits July 18, 2025 23:05

refactor: introduce manifest generator -- part 1

418b15b

Signed-off-by: jerryzhuang <zhuangqhc@gmail.com>

fix

c201508

Signed-off-by: jerryzhuang <zhuangqhc@gmail.com>

zhuangqh requested review from Fei-Guo and chewong as code owners July 18, 2025 13:10

github-project-automation bot added this to KAITO Roadmap Jul 18, 2025

zhuangqh temporarily deployed to unit-tests July 18, 2025 13:10 — with GitHub Actions Inactive

zhuangqh temporarily deployed to e2e-test July 18, 2025 13:10 — with GitHub Actions Inactive

kaito-pr-agent bot added the Review effort 4/5 label Jul 18, 2025

fix

4af3f47

Signed-off-by: zhuangqh <zhuangqhc@gmail.com>

zhuangqh had a problem deploying to e2e-test July 18, 2025 22:06 — with GitHub Actions Failure

zhuangqh temporarily deployed to unit-tests July 18, 2025 22:06 — with GitHub Actions Inactive

Fei-Guo approved these changes Jul 21, 2025

View reviewed changes

chewong approved these changes Jul 21, 2025

View reviewed changes

chewong mentioned this pull request Jul 21, 2025

feat: Add Gateway API Inference Extension support #1252

Merged

1 task

Merge branch 'main' into refactor-manifests

7351f42

zhuangqh temporarily deployed to unit-tests July 22, 2025 00:13 — with GitHub Actions Inactive

zhuangqh temporarily deployed to e2e-test July 22, 2025 00:13 — with GitHub Actions Inactive

zhuangqh merged commit aea0a42 into kaito-project:main Jul 22, 2025
12 checks passed

github-project-automation bot moved this to Done in KAITO Roadmap Jul 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: introduce manifest generator -- part 1 #1284

refactor: introduce manifest generator -- part 1 #1284

Uh oh!

zhuangqh commented Jul 18, 2025

Uh oh!

kaito-pr-agent bot commented Jul 18, 2025

Uh oh!

kaito-pr-agent bot commented Jul 18, 2025

Uh oh!

kaito-pr-agent bot commented Jul 18, 2025

Uh oh!

chewong Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!


		type TypedManifestModifier[C GeneratorContext, T ManifestType] func(ctx C, obj T) error

		func GenerateManifest[C GeneratorContext, T ManifestType](ctx C, modifiers ...TypedManifestModifier[C, T]) (T, error) {

refactor: introduce manifest generator -- part 1 #1284

refactor: introduce manifest generator -- part 1 #1284

Uh oh!

Conversation

zhuangqh commented Jul 18, 2025

Uh oh!

kaito-pr-agent bot commented Jul 18, 2025

Title

Description

Changes walkthrough 📝

Uh oh!

kaito-pr-agent bot commented Jul 18, 2025

PR Reviewer Guide 🔍

Uh oh!

kaito-pr-agent bot commented Jul 18, 2025

PR Code Suggestions ✨

Uh oh!

chewong Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!