Skip to content

Conversation

andjsmi
Copy link
Contributor

@andjsmi andjsmi commented Feb 21, 2025

Motivation

SageMaker Endpoints support /ping for healthchecks and /invocations for invocation payloads however sglang currently doesn't support this invocation pattern to make the package usable on SageMaker Endpoints.

Modifications

This pull request adds two endpoints for /ping/ and /invocations in http_server.py.

/ping provides the same functionality as /health. At present /invocations acts the same as /v1/chat/completions however it may be worth expanding this to invoke as based on the request content.

I've included test cases as well and have been able to test on a SageMaker endpoint.

Checklist

@zhaochenyang20
Copy link
Collaborator

@andjsmi This should be a nice PR. But do you find someone to test it on sagemaker? We do not have the access.

@andjsmi
Copy link
Contributor Author

andjsmi commented Feb 21, 2025

Hey @zhaochenyang20.

Thanks! Yes, I've tested it on SageMaker myself and have included a screenshot below.

The main requirements are responding empty 200 OK from the /ping endpoint and then accepting POST requests on /invocations, with the ability to stream chunked encoding back. I have tested all these usecases on a SageMaker endpoint.

If there's a particular way you'd like me to test further, please let me know.
image (7)

@@ -0,0 +1,78 @@
ARG CUDA_VERSION=12.5.1

FROM nvcr.io/nvidia/tritonserver:24.04-py3-min
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the lmsysorg/sglang:latest as the base image?

@zhyncs
Copy link
Member

zhyncs commented Feb 21, 2025

This change overall looks good, I can merge it first, minor changes can be submitted in a follow-up, thank you very much for AWS's support!

@zhyncs zhyncs merged commit 1df6eab into sgl-project:main Feb 21, 2025
3 of 18 checks passed
aoshen524 pushed a commit to aoshen524/sglang that referenced this pull request Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants