-
Notifications
You must be signed in to change notification settings - Fork 2.8k
oai: Add new openai api server entypoint #7105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @CatherineSue, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request lays the groundwork for adding an OpenAI-compatible API server to SGLang. It introduces a new, separate Python script (api_server.py
) that uses FastAPI to create a basic HTTP server structure. The intent is to provide a dedicated entrypoint for OpenAI API endpoints, allowing for a clear separation of concerns from the existing SGLang HTTP server and addressing issue #7101. While the core API endpoints (like completions) are not yet implemented, this PR establishes the necessary server framework.
Highlights
- New OpenAI API Server Entrypoint: I've added a new file,
python/sglang/srt/entrypoints/openai/api_server.py
, to serve as a dedicated entrypoint for an OpenAI-compatible API server. This keeps the OpenAI API logic separate from the existinghttp_server.py
. - Basic FastAPI Application: The new file sets up a basic FastAPI application with CORS middleware enabled. It includes a
/health
endpoint for readiness/liveness checks and a placeholder/v1/models
endpoint, compatible with the OpenAI API standard. - Server Initialization Structure: The script includes logic to parse standard
ServerArgs
from the command line and sets up basic logging. It also defines alifespan
context manager usingasynccontextmanager
to handle future engine initialization and shutdown logic.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a new FastAPI-based entrypoint for an OpenAI-compatible API server. The initial structure is well-defined, including argument parsing, logging, and basic health/model endpoints. Key areas for future work include implementing the /v1/models
endpoint and considering security aspects like CORS configuration for production. The code generally follows Python best practices.
app.add_middleware( | ||
CORSMiddleware, | ||
allow_origins=["*"], | ||
allow_credentials=True, | ||
allow_methods=["*"], | ||
allow_headers=["*"], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CORS middleware is currently configured with allow_origins=["*"]
, which allows requests from any origin. While this is often acceptable for development, it's a security risk for production environments. It's recommended to restrict allow_origins
to a specific list of trusted domains before deploying to production to mitigate risks like Cross-Site Request Forgery (CSRF).
|
||
|
||
# Store global states | ||
class AppState: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AppState
class is used to manage global state for the application. Adding a docstring to explain its purpose and the attributes it holds (e.g., engine
, server_args
) would improve code clarity and maintainability, especially as the application grows and more state might be added.
class AppState: | |
class AppState: | |
"""Stores global state for the FastAPI application, including the SGLang engine and server arguments.""" |
async def show_models(): | ||
"""Show available models. Currently, it returns the served model name. | ||
|
||
This endpoint is compatible with the OpenAI API standard. | ||
""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /v1/models
endpoint is a key part of OpenAI API compatibility but is currently a placeholder (pass
).
- It should be implemented to return information about the
served_model_name
(available viaapp.state.server_args.served_model_name
). - The function
show_models
is missing a return type hint. Adding one (e.g.,-> Dict
or a more specific Pydantic model likeModelList
) improves type safety and code clarity.
Consider raising NotImplementedError
or returning an HTTP 501 status code as a clearer placeholder until it's fully implemented.
async def show_models(): | |
"""Show available models. Currently, it returns the served model name. | |
This endpoint is compatible with the OpenAI API standard. | |
""" | |
pass | |
async def show_models() -> Dict: # Or a Pydantic model e.g., ModelList | |
"""Show available models. Currently, it returns the served model name. | |
This endpoint is compatible with the OpenAI API standard. | |
""" | |
# TODO: Implement returning actual model details based on app.state.server_args.served_model_name | |
# Example structure: | |
# return { | |
# "object": "list", | |
# "data": [ | |
# { | |
# "id": app.state.server_args.served_model_name, | |
# "object": "model", | |
# "owned_by": "sglang", | |
# "created": int(time.time()) # Or a fixed timestamp | |
# } | |
# ] | |
# } | |
raise NotImplementedError("The /v1/models endpoint is not yet implemented.") |
Close this as #7179 is a more up to date PR. |
Motivation
Completes #7101
Modifications
api_server.py
for openai api server refactoring. Keeping it seperate from the currenthttp_server.py
to not affect the current SGLang process.Checklist