Skip to content

Conversation

CatherineSue
Copy link
Collaborator

@CatherineSue CatherineSue commented Jun 12, 2025

Motivation

Completes #7101

Modifications

  • Add a new api_server.py for openai api server refactoring. Keeping it seperate from the current http_server.py to not affect the current SGLang process.

Checklist

@CatherineSue
Copy link
Collaborator Author

Screenshot 2025-06-11 at 5 16 57 PM

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @CatherineSue, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for adding an OpenAI-compatible API server to SGLang. It introduces a new, separate Python script (api_server.py) that uses FastAPI to create a basic HTTP server structure. The intent is to provide a dedicated entrypoint for OpenAI API endpoints, allowing for a clear separation of concerns from the existing SGLang HTTP server and addressing issue #7101. While the core API endpoints (like completions) are not yet implemented, this PR establishes the necessary server framework.

Highlights

  • New OpenAI API Server Entrypoint: I've added a new file, python/sglang/srt/entrypoints/openai/api_server.py, to serve as a dedicated entrypoint for an OpenAI-compatible API server. This keeps the OpenAI API logic separate from the existing http_server.py.
  • Basic FastAPI Application: The new file sets up a basic FastAPI application with CORS middleware enabled. It includes a /health endpoint for readiness/liveness checks and a placeholder /v1/models endpoint, compatible with the OpenAI API standard.
  • Server Initialization Structure: The script includes logic to parse standard ServerArgs from the command line and sets up basic logging. It also defines a lifespan context manager using asynccontextmanager to handle future engine initialization and shutdown logic.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a new FastAPI-based entrypoint for an OpenAI-compatible API server. The initial structure is well-defined, including argument parsing, logging, and basic health/model endpoints. Key areas for future work include implementing the /v1/models endpoint and considering security aspects like CORS configuration for production. The code generally follows Python best practices.

Comment on lines +74 to +80
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The CORS middleware is currently configured with allow_origins=["*"], which allows requests from any origin. While this is often acceptable for development, it's a security risk for production environments. It's recommended to restrict allow_origins to a specific list of trusted domains before deploying to production to mitigate risks like Cross-Site Request Forgery (CSRF).



# Store global states
class AppState:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The AppState class is used to manage global state for the application. Adding a docstring to explain its purpose and the attributes it holds (e.g., engine, server_args) would improve code clarity and maintainability, especially as the application grows and more state might be added.

Suggested change
class AppState:
class AppState:
"""Stores global state for the FastAPI application, including the SGLang engine and server arguments."""

Comment on lines +92 to +97
async def show_models():
"""Show available models. Currently, it returns the served model name.

This endpoint is compatible with the OpenAI API standard.
"""
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The /v1/models endpoint is a key part of OpenAI API compatibility but is currently a placeholder (pass).

  1. It should be implemented to return information about the served_model_name (available via app.state.server_args.served_model_name).
  2. The function show_models is missing a return type hint. Adding one (e.g., -> Dict or a more specific Pydantic model like ModelList) improves type safety and code clarity.

Consider raising NotImplementedError or returning an HTTP 501 status code as a clearer placeholder until it's fully implemented.

Suggested change
async def show_models():
"""Show available models. Currently, it returns the served model name.
This endpoint is compatible with the OpenAI API standard.
"""
pass
async def show_models() -> Dict: # Or a Pydantic model e.g., ModelList
"""Show available models. Currently, it returns the served model name.
This endpoint is compatible with the OpenAI API standard.
"""
# TODO: Implement returning actual model details based on app.state.server_args.served_model_name
# Example structure:
# return {
# "object": "list",
# "data": [
# {
# "id": app.state.server_args.served_model_name,
# "object": "model",
# "owned_by": "sglang",
# "created": int(time.time()) # Or a fixed timestamp
# }
# ]
# }
raise NotImplementedError("The /v1/models endpoint is not yet implemented.")

@CatherineSue
Copy link
Collaborator Author

Close this as #7179 is a more up to date PR.

@CatherineSue CatherineSue deleted the chang/oai-refactor-1 branch June 17, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant