oai: Add new openai api server entypoint #7105

CatherineSue · 2025-06-12T00:21:59Z

Motivation

Completes #7101

Modifications

Add a new api_server.py for openai api server refactoring. Keeping it seperate from the current http_server.py to not affect the current SGLang process.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

CatherineSue · 2025-06-12T00:22:16Z

gemini-code-assist

Summary of Changes

Hello @CatherineSue, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for adding an OpenAI-compatible API server to SGLang. It introduces a new, separate Python script (api_server.py) that uses FastAPI to create a basic HTTP server structure. The intent is to provide a dedicated entrypoint for OpenAI API endpoints, allowing for a clear separation of concerns from the existing SGLang HTTP server and addressing issue #7101. While the core API endpoints (like completions) are not yet implemented, this PR establishes the necessary server framework.

Highlights

New OpenAI API Server Entrypoint: I've added a new file, python/sglang/srt/entrypoints/openai/api_server.py, to serve as a dedicated entrypoint for an OpenAI-compatible API server. This keeps the OpenAI API logic separate from the existing http_server.py.
Basic FastAPI Application: The new file sets up a basic FastAPI application with CORS middleware enabled. It includes a /health endpoint for readiness/liveness checks and a placeholder /v1/models endpoint, compatible with the OpenAI API standard.
Server Initialization Structure: The script includes logic to parse standard ServerArgs from the command line and sets up basic logging. It also defines a lifespan context manager using asynccontextmanager to handle future engine initialization and shutdown logic.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This PR introduces a new FastAPI-based entrypoint for an OpenAI-compatible API server. The initial structure is well-defined, including argument parsing, logging, and basic health/model endpoints. Key areas for future work include implementing the /v1/models endpoint and considering security aspects like CORS configuration for production. The code generally follows Python best practices.

gemini-code-assist · 2025-06-12T00:24:06Z

python/sglang/srt/entrypoints/openai/api_server.py

+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)


The CORS middleware is currently configured with allow_origins=["*"], which allows requests from any origin. While this is often acceptable for development, it's a security risk for production environments. It's recommended to restrict allow_origins to a specific list of trusted domains before deploying to production to mitigate risks like Cross-Site Request Forgery (CSRF).

gemini-code-assist · 2025-06-12T00:24:06Z

python/sglang/srt/entrypoints/openai/api_server.py

+
+
+# Store global states
+class AppState:


The AppState class is used to manage global state for the application. Adding a docstring to explain its purpose and the attributes it holds (e.g., engine, server_args) would improve code clarity and maintainability, especially as the application grows and more state might be added.

Suggested change

class AppState:

class AppState:

"""Stores global state for the FastAPI application, including the SGLang engine and server arguments."""

gemini-code-assist · 2025-06-12T00:24:06Z

python/sglang/srt/entrypoints/openai/api_server.py

+async def show_models():
+    """Show available models. Currently, it returns the served model name.
+
+    This endpoint is compatible with the OpenAI API standard.
+    """
+    pass


The /v1/models endpoint is a key part of OpenAI API compatibility but is currently a placeholder (pass).

It should be implemented to return information about the served_model_name (available via app.state.server_args.served_model_name).

The function show_models is missing a return type hint. Adding one (e.g., -> Dict or a more specific Pydantic model like ModelList) improves type safety and code clarity.

Consider raising NotImplementedError or returning an HTTP 501 status code as a clearer placeholder until it's fully implemented.

Suggested change

async def show_models():

"""Show available models. Currently, it returns the served model name.

This endpoint is compatible with the OpenAI API standard.

"""

pass

async def show_models() -> Dict: # Or a Pydantic model e.g., ModelList

"""Show available models. Currently, it returns the served model name.

This endpoint is compatible with the OpenAI API standard.

"""

# TODO: Implement returning actual model details based on app.state.server_args.served_model_name

# Example structure:

# return {

# "object": "list",

# "data": [

# {

# "id": app.state.server_args.served_model_name,

# "object": "model",

# "owned_by": "sglang",

# "created": int(time.time()) # Or a fixed timestamp

# }

# ]

# }

raise NotImplementedError("The /v1/models endpoint is not yet implemented.")

CatherineSue · 2025-06-16T16:34:18Z

Close this as #7179 is a more up to date PR.

oai: Add new openai api server entypoint

7a9d6ba

CatherineSue requested a review from zhaochenyang20 as a code owner June 12, 2025 00:22

gemini-code-assist bot reviewed Jun 12, 2025

View reviewed changes

Fix lint

91edf20

jhinpan mentioned this pull request Jun 13, 2025

Add test for refactored openai server #7161

Merged

9 tasks

yhyang201 mentioned this pull request Jun 14, 2025

OAI Server Skeleton & Core Utility Endpoints #7179

Merged

16 tasks

CatherineSue closed this Jun 16, 2025

CatherineSue deleted the chang/oai-refactor-1 branch June 17, 2025 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

oai: Add new openai api server entypoint #7105

oai: Add new openai api server entypoint #7105

Uh oh!

CatherineSue commented Jun 12, 2025 •

edited

Loading

Uh oh!

CatherineSue commented Jun 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 12, 2025

Uh oh!

gemini-code-assist bot Jun 12, 2025

Uh oh!

gemini-code-assist bot Jun 12, 2025

Uh oh!

CatherineSue commented Jun 16, 2025

Uh oh!

Uh oh!

	class AppState:
	class AppState:
	"""Stores global state for the FastAPI application, including the SGLang engine and server arguments."""

-async def show_models():
-    """Show available models. Currently, it returns the served model name.
-    This endpoint is compatible with the OpenAI API standard.
-    """
-    pass
+async def show_models() -> Dict:  # Or a Pydantic model e.g., ModelList
+    """Show available models. Currently, it returns the served model name.
+    This endpoint is compatible with the OpenAI API standard.
+    """
+    # TODO: Implement returning actual model details based on app.state.server_args.served_model_name
+    # Example structure:
+    # return {
+    #     "object": "list",
+    #     "data": [
+    #         {
+    #             "id": app.state.server_args.served_model_name,
+    #             "object": "model",
+    #             "owned_by": "sglang",
+    #             "created": int(time.time()) # Or a fixed timestamp
+    #         }
+    #     ]
+    # }
+    raise NotImplementedError("The /v1/models endpoint is not yet implemented.")

oai: Add new openai api server entypoint #7105

oai: Add new openai api server entypoint #7105

Uh oh!

Conversation

CatherineSue commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

CatherineSue commented Jun 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

CatherineSue commented Jun 16, 2025

Uh oh!

Uh oh!

CatherineSue commented Jun 12, 2025 •

edited

Loading