Skip to content

[Feat][Kubectl-Plugin] Implement kubectl session for RayJob and RayService #2379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MortalHappiness
Copy link
Member

@MortalHappiness MortalHappiness commented Sep 13, 2024

Why are these changes needed?

#2298 only implemented kubectl ray session for RayCluster. This PR implements kubectl session for RayJob and RayService.

image

Related issue number

N/A

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@MortalHappiness
Copy link
Member Author

@kevin85421 @andrewsykim PTAL

@MortalHappiness MortalHappiness force-pushed the feature/kubectl-session-rayjob-service branch 2 times, most recently from e069d65 to c4b5964 Compare September 13, 2024 16:01
@andrewsykim
Copy link
Member

cc @chiayi


sessionExample = templates.Examples(`
# Forward local ports to the RayCluster resource
kubectl ray session raycluster/my-raycluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this default to raycluster if the resource type is not specified?

Copy link
Member Author

@MortalHappiness MortalHappiness Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Now the resource type is required. Do you think it is better to default to RayCluster if the resource type is not specified?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is better to default to RayCluster if the resource type is not specified?

Personally, yes I think so. Similar to how kubectl logs defaults the resource type to Pods, I think we should default the resource type to RayCluster

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make senses to me. Updated in e83ee5b

Copy link
Contributor

@chiayi chiayi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for creating the util package as well!

Copy link
Member

@andrewsykim andrewsykim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, otherwise LGTM

Copy link
Member

@kevin85421 kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the PR, but based on the screenshot, I think we don't need to forward the Ray client port for RayJob and RayService. In addition, could you print the information to specify which K8s service is being forwarded?

@kevin85421 kevin85421 self-assigned this Sep 16, 2024
@MortalHappiness MortalHappiness force-pushed the feature/kubectl-session-rayjob-service branch from e83ee5b to 0fa67b9 Compare September 17, 2024 01:28
@MortalHappiness
Copy link
Member Author

@kevin85421 Port-forwarding for RayJob and RayService removed. Also the screenshot is updated.

@kevin85421
Copy link
Member

TODO: Use rayservice-sample-serve-svc and rayservice-sample-head-svc instead.

@MortalHappiness
Copy link
Member Author

MortalHappiness commented Sep 17, 2024

@kevin85421 I find that they are the same because kubectl source code first find the corresponding pod selected by the service.

https://github.com/kubernetes/kubectl/blob/262825a8a665c7cae467dfaa42b63be5a5b8e5a2/pkg/cmd/portforward/portforward.go#L345

Tested by setting breakpoints in IDE (please see the value of obj.ObjectMeta.Name and forwardablePod.ObjectMeta.Name):

image

image

image

Notice that different obj.ObjectMeta.Name results in the same forwardablePod.ObjectMeta.Name.

Therefore I think we can keep the current implementation because execute multiple kubectl port-forward commands parallelly is more complicated.

@kevin85421
Copy link
Member

I find that they are the same because kubectl source code first find the corresponding pod selected by the service.

What is the behavior of forwarding rayservice-sample-head-svc and then performing a zero-downtime upgrade? If the session fails after the old RayCluster is torn down, we can use rayservice-sample-raycluster-xxxxx-head-svc instead.

…rvice

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
…ward for RayJob and RayService

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
…ervice svc

Signed-off-by: Chi-Sheng Liu <chishengliu@chishengliu.com>
@MortalHappiness MortalHappiness force-pushed the feature/kubectl-session-rayjob-service branch from 0fa67b9 to 12e7abe Compare September 17, 2024 12:34
@MortalHappiness
Copy link
Member Author

MortalHappiness commented Sep 17, 2024

@kevin85421 After experiment, the kubectl port-forward service/rayservice-sample-serve-svc 8000:8000 command will fail after a zero-downtime upgrade is performed. So actually there is no differences between which service we choose to do the port-forwarding. I've added a comment for the getRayHeadSvcNameByRayService function.

Explanation of the following screenshot:

  1. Run kubectl port-forward service/rayservice-sample-serve-svc 8000:8000
  2. A zero-downtime upgrade for RayService rayservice-sample is performed.
  3. Connection to the pod is lost when a request is sent to the original pod.
  4. We need to execute kubectl port-forward service/rayservice-sample-serve-svc 8000:8000 and everything is fine because a different pod is connected.

image

@kevin85421
Copy link
Member

Note: If we port-forward a K8s service, it will pick a Pod under the hood. If the Pod is killed, the port-forward command will not pick other K8s service endpoints automatically. Instead, we need to run the port-forward command again.

@kevin85421 kevin85421 merged commit 5d3bceb into ray-project:master Sep 17, 2024
27 checks passed
@andrewsykim
Copy link
Member

Could we run the port forward in a go routine and retry the connection if it fails?

@kevin85421
Copy link
Member

Could we run the port forward in a go routine and retry the connection if it fails?

make sense

@kevin85421
Copy link
Member

open an issue to track the progress: #2388

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants