Skip to content

Conversation

whybeyoung
Copy link
Collaborator

@whybeyoung whybeyoung commented Jul 28, 2025

Although PR #8115 (which introduced a refined health check mechanism) has been reverted, the core issues it aimed to address—inadequate health monitoring in production and cloud-native environments—remain unresolved.

Merge the routing logic of the original /health and /health_generate endpoints. To maintain compatibility, both routes will be retained but will follow the same logic:

Step 1: When sglang starts, initialize the state as Starting. Once the warmup request succeeds, set the state to Healthy.

Step 2: Only when the state is Healthy will the system intelligently determine whether it can successfully generate a token based on the load situation to judge if the current server is healthy.

This PR, combined with Kubernetes probes, ensures that traffic is only routed to the engine after it has fully started, and the health_generate logic will only be executed when the engine is running normally.

CC @ByronHsu @merrymercy

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @whybeyoung, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the server's health check mechanism to provide a more robust and accurate indication of its operational status. It unifies the existing /health and /health_generate endpoints under a single, intelligent logic that considers the server's internal state, particularly during startup and warmup phases. This enhancement aims to improve compatibility with external systems like Kubernetes probes, ensuring traffic is only routed to a fully initialized and capable engine.

Highlights

  • Unified Health Check Endpoint: The /health and /health_generate endpoints are now merged, with both routes pointing to the same underlying logic. This simplifies health check configuration and ensures consistency.
  • Server Status Tracking: A new ServerStatus enum (Starting, Up, UnHealthy, Crashed) has been introduced in sglang/srt/utils.py to track the server's operational state. The TokenizerManager now maintains this status.
  • Enhanced Health Check Logic: The health check logic in http_server.py now performs a two-step verification: first, it checks the internal server_status, returning a 503 if the server is not Up. Only if the server is Up does it proceed with the token generation test, providing a more accurate reflection of the server's readiness and capability.
  • Warmup Status Integration: The server's ServerStatus is now updated during the warmup process in _execute_server_warmup. Upon successful warmup, the status is set to Up; if warmup fails, it's set to UnHealthy. This ensures that the health check accurately reflects the server's readiness post-startup, crucial for Kubernetes probes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request merges the /health and /health_generate endpoints, introducing a two-step health check. The changes include a new ServerStatus enum and its integration into the server lifecycle. The review suggests improvements in code clarity, debuggability, and consistency.

@whybeyoung
Copy link
Collaborator Author

Prefill: image
Decode:
image

Tested OK

whybeyoung and others added 5 commits August 2, 2025 09:26
@merrymercy merrymercy merged commit 6f9baf1 into sgl-project:main Aug 3, 2025
106 of 114 checks passed
htiennv pushed a commit to htiennv/sglang that referenced this pull request Aug 5, 2025
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
ShangmingCai pushed a commit that referenced this pull request Aug 5, 2025
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
ShangmingCai pushed a commit that referenced this pull request Aug 5, 2025
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 17, 2025
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
narutolhy pushed a commit to narutolhy/sglang that referenced this pull request Aug 18, 2025
Signed-off-by: ybyang <ybyang7@iflytek.com>
Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com>
Co-authored-by: Kan Wu <wukanustc@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants