Skip to content

[Epic][Feature] KubeRay v1.4.0 - Operator SLI Tracking #3171

@win5923

Description

@win5923

Description

https://docs.google.com/document/d/1zNiE7lVZYjhrxlTbh1UXOVpR6hh1GIeSfCfE9Lt5v6Y/edit?tab=t.0

Context

In production, SRE teams typically define Service Level Indicators (SLIs) to ensure that services meet expected performance and reliability standards. However, there are currently no dedicated SLIs for Ray Cluster, Ray Service, and Ray Job, which makes it challenging to monitor their health and performance.

Solution

We propose new metrics to enhance KubeRay's observability and providing better insights into the status and performance of Ray Cluster, Ray Service, and Ray Job.

sub-issues

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions