[Improvements] Merge health check route #8444

whybeyoung · 2025-07-28T05:55:54Z

Although PR #8115 (which introduced a refined health check mechanism) has been reverted, the core issues it aimed to address—inadequate health monitoring in production and cloud-native environments—remain unresolved.

Merge the routing logic of the original /health and /health_generate endpoints. To maintain compatibility, both routes will be retained but will follow the same logic:

Step 1: When sglang starts, initialize the state as Starting. Once the warmup request succeeds, set the state to Healthy.

Step 2: Only when the state is Healthy will the system intelligently determine whether it can successfully generate a token based on the load situation to judge if the current server is healthy.

This PR, combined with Kubernetes probes, ensures that traffic is only routed to the engine after it has fully started, and the health_generate logic will only be executed when the engine is running normally.

CC @ByronHsu @merrymercy

gemini-code-assist

Summary of Changes

Hello @whybeyoung, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the server's health check mechanism to provide a more robust and accurate indication of its operational status. It unifies the existing /health and /health_generate endpoints under a single, intelligent logic that considers the server's internal state, particularly during startup and warmup phases. This enhancement aims to improve compatibility with external systems like Kubernetes probes, ensuring traffic is only routed to a fully initialized and capable engine.

Highlights

Unified Health Check Endpoint: The /health and /health_generate endpoints are now merged, with both routes pointing to the same underlying logic. This simplifies health check configuration and ensures consistency.
Server Status Tracking: A new ServerStatus enum (Starting, Up, UnHealthy, Crashed) has been introduced in sglang/srt/utils.py to track the server's operational state. The TokenizerManager now maintains this status.
Enhanced Health Check Logic: The health check logic in http_server.py now performs a two-step verification: first, it checks the internal server_status, returning a 503 if the server is not Up. Only if the server is Up does it proceed with the token generation test, providing a more accurate reflection of the server's readiness and capability.
Warmup Status Integration: The server's ServerStatus is now updated during the warmup process in _execute_server_warmup. Upon successful warmup, the status is set to Up; if warmup fails, it's set to UnHealthy. This ensures that the health check accurately reflects the server's readiness post-startup, crucial for Kubernetes probes.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request merges the /health and /health_generate endpoints, introducing a two-step health check. The changes include a new ServerStatus enum and its integration into the server lifecycle. The review suggests improvements in code clarity, debuggability, and consistency.

python/sglang/srt/entrypoints/http_server.py

python/sglang/srt/utils.py

python/sglang/srt/entrypoints/http_server.py

Signed-off-by: ybyang <ybyang7@iflytek.com>

whybeyoung · 2025-08-01T11:25:15Z

Prefill:
Decode:

Tested OK

python/sglang/srt/entrypoints/http_server.py

Signed-off-by: ybyang <ybyang7@iflytek.com>

Signed-off-by: ybyang <ybyang7@iflytek.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Kan Wu <wukanustc@gmail.com>

whybeyoung requested review from merrymercy, Ying1123, hnyls2002, xiezhq-hermann, ispobock, CatherineSue, slin1237, zhyncs and ByronHsu as code owners July 28, 2025 05:55

gemini-code-assist bot reviewed Jul 28, 2025

View reviewed changes

python/sglang/srt/entrypoints/http_server.py Show resolved Hide resolved

python/sglang/srt/entrypoints/http_server.py Outdated Show resolved Hide resolved

python/sglang/srt/entrypoints/http_server.py Show resolved Hide resolved

python/sglang/srt/utils.py Outdated Show resolved Hide resolved

slin1237 approved these changes Jul 28, 2025

View reviewed changes

zhyncs reviewed Jul 28, 2025

View reviewed changes

python/sglang/srt/entrypoints/http_server.py Show resolved Hide resolved

whybeyoung force-pushed the dev/health branch from 90a7e51 to 879b1d5 Compare July 28, 2025 08:35

ByronHsu mentioned this pull request Jul 29, 2025

[PD] Fix the /health_generate entrypoint doesn't work in PD disaggregation deployment #8063

Open

6 tasks

whybeyoung added 3 commits August 1, 2025 17:48

[Improvements] Merge health check route

0798279

Upd

4e45cdc

[fix] add fake booststrap room during health_generate

9ddfc34

Signed-off-by: ybyang <ybyang7@iflytek.com>

whybeyoung force-pushed the dev/health branch from 879b1d5 to 9ddfc34 Compare August 1, 2025 10:35

ByronHsu reviewed Aug 1, 2025

View reviewed changes

python/sglang/srt/entrypoints/http_server.py Outdated Show resolved Hide resolved

whybeyoung and others added 5 commits August 2, 2025 09:26

minor fix

cc0b02b

Signed-off-by: ybyang <ybyang7@iflytek.com>

fix HealthCheck Abort

d040671

Signed-off-by: ybyang <ybyang7@iflytek.com>

Merge branch 'main' into dev/health

57ce5e6

Fix code style and fix health check for prefill

a0f3960

Fix

14a469a

merrymercy approved these changes Aug 3, 2025

View reviewed changes

merrymercy merged commit 6f9baf1 into sgl-project:main Aug 3, 2025
106 of 114 checks passed

whybeyoung mentioned this pull request Aug 3, 2025

[Roadmap] Distributed Serving Enhancement on 2025 H2 #8210

Open

21 tasks

zhuzilin mentioned this pull request Aug 4, 2025

[RL] fix skip_server_warmup and rl health_generate logic #8757

Merged

6 tasks

ShangmingCai pushed a commit that referenced this pull request Aug 5, 2025

[Improvements] Merge health check route (#8444)

c61a0e0

Signed-off-by: ybyang <ybyang7@iflytek.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Kan Wu <wukanustc@gmail.com>

ShangmingCai pushed a commit that referenced this pull request Aug 5, 2025

[Improvements] Merge health check route (#8444)

c2220ef

Signed-off-by: ybyang <ybyang7@iflytek.com> Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Kan Wu <wukanustc@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Improvements] Merge health check route #8444

[Improvements] Merge health check route #8444

Uh oh!

whybeyoung commented Jul 28, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whybeyoung commented Aug 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Improvements] Merge health check route #8444

[Improvements] Merge health check route #8444

Uh oh!

Conversation

whybeyoung commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whybeyoung commented Aug 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whybeyoung commented Jul 28, 2025 •

edited

Loading