-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[Feature] Simple Improve Health Check Mechanism for Production-Grade Stability #8115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ybyang <ybyang7@iflytek.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @whybeyoung, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly upgrades the health check mechanism from a static 200-status response to a dynamic, multi-state system. It introduces a ServerStatus
enum to represent granular states like Up
, Starting
, UnHealthy
, and Crashed
, and integrates status updates throughout the server's lifecycle. This enhancement provides critical visibility into the server's operational state, enabling more reliable deployment and management in cloud-native environments.
Highlights
- Enhanced Health Check Logic: Replaced the simplistic 200-status health check with a detailed
ServerStatus
enum (Up
,Starting
,UnHealthy
,Crashed
) to provide granular server state, crucial for cloud-native environments. - Dynamic Status Reporting: Introduced a new
/health
POST endpoint and areport_health
utility function, enabling various components (engine, scheduler, HTTP server) to update the server's health status dynamically throughout its lifecycle. - Improved Lifecycle Management: Integrated status updates into critical server phases like startup, warmup, and error handling (e.g., child process crashes, scheduler exceptions), allowing for more accurate detection of service anomalies and better orchestration.
- Refactored Health State Management: Replaced the boolean
health_check_failed
flag with the comprehensiveServerStatus
enum, centralizing and standardizing health state management across the system for clarity and robustness.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a more robust health check mechanism, which is a great improvement for production stability. The state transitions and overall logic are well-thought-out.
I've identified a few critical issues where the code would fail at runtime due to incorrect attribute access on a class instead of an instance. I've also suggested improvements for robustness in the report_health
utility and error handling in the new /health
POST endpoint.
Once these issues are addressed, this will be a solid feature.
Signed-off-by: ybyang <ybyang7@iflytek.com>
Signed-off-by: ybyang <ybyang7@iflytek.com>
…Stability (#8115) Signed-off-by: ybyang <ybyang7@iflytek.com>
Background
The current health interface in sglang is a "fake" endpoint that simply returns a 200 status code. This poses significant stability issues in production environments, especially when integrated with cloud-native systems like Kubernetes. The lack of meaningful health status makes it impossible to accurately detect service anomalies or coordinate lifecycle management (e.g., restarts) in orchestrated environments.
Proposed Solution
We've designed a robust server status mechanism to address this gap:
A service is considered healthy only when its status is Up. All other states indicate an unhealthy condition.
State Transition Logic
Initial state: Starting (engine initialization phase)
Transitions to Up after:
HTTP server completes startup
Warm-up requests execute successfully
Transitions to Crashed if:
Scheduler or other critical subprocesses exit abnormally
Fatal errors occur during request processing
This mechanism becomes even more critical in distributed deployments:
Enables detection of hang/failure states across multiple Prefill (P) and Decode (D) nodes
Allows Kubernetes/LWS to automatically restart unhealthy replicas by monitoring the health endpoint
Prevents traffic from being routed to nodes in Starting/UnHealthy/Crashed states
Rationale for Not Using health_generate
The existing health_generate endpoint was deemed unsuitable because:
Its response time is heavily influenced by batch queues in high-concurrency scenarios
It introduces unnecessary computation overhead for health checks
Fails to provide granular status information (e.g., distinguishing between "starting" and "crashed")
Limitations and Future Work
A known limitation is that engine-level hangs may not be detected by this mechanism. We plan to supplement this with:
Periodic internal liveness probes
Heartbeat monitoring between critical components
Automatic state transition to UnHealthy on probe timeouts
This change ensures sglang works reliably in cloud-native production environments while maintaining simplicity in the health check implementation.
CC @ShangmingCai @ByronHsu @hnyls2002 @zhyncs