Skip to content

[Bug] no feedback about failure to create submitter pod due to invalid spec #2210

@mickvangelderen

Description

@mickvangelderen

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

created a RayJob with a submitterPodTemplate but no restartPolicy

had to search the logs of the ray-operator to find:

{"level":"error","ts":"2024-06-28T18:09:14.679Z","logger":"controllers.RayJob","msg":"failed to create k8s Job","RayJob":{"name":"mick-gxccf","namespace":"launch"},"reconcileID":"3b03831c-d14d-497f-9c8c-4ac790e1ff35","error":"Job.batch \"mick-gxccf\" is invalid: spec.template.spec.restartPolicy: Required value: valid values: \"OnFailure\", \"Never\"","stacktrace":"github.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayJobReconciler).createNewK8sJob\n\t/home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/rayjob_controller.go:440\ngithub.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayJobReconciler).createK8sJobIfNeed\n\t/home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/rayjob_controller.go:350\ngithub.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayJobReconciler).Reconcile\n\t/home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/rayjob_controller.go:168\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}

I thought the RayJob spec is supposed to be validated on submission to the API? Is the validation not the same?

Reproduction script

"submitterPodTemplate": {
    "spec": {
        // "restartPolicy": "Never", <- OFFENDER
        // ... as usual
    }
}

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions