Open AI API hidden states #6716

kyle-pena-kuzco · 2025-05-28T19:15:03Z

Motivation

This PR exposes hidden states through the OpenAI API.
Hidden states can now be returned by including return_hidden_states in the request body.
This feature is enabled by supplying the flag --enable-return-hidden-states.

The test failure noted in the last submission of this PR is resolved, as explained below.

This feature supports all varieties of OpenAI generation requests:

v1/chat/completions and v1/completions
streaming and non-streaming
Parallel sampling (n>1)

The previous implementation would re-compile the CUDA graph every time the user switched between return_hidden_states True and False at the request level.

This behavior is resolved - changing return_hidden_states will no longer trigger a CUDA graph re-capture.

Modifications

Adds an engine-level flag to enable returning returning hidden states through the engine, native API, or openai API.
When this engine-level flag is supplied, if CUDA graph is enabled, the CUDA graph runner is compiled in CaptureHiddenMode.FULL on startup (so that a double-capture isn't needed).
This behavior is completely backwards compatible with using the CUDA graph runner for the TARGET model in speculative decoding: TARGET_VERIFY runs with CaptureHiddenMode.FULL, and TARGET_EXTEND does not use the CUDA graph runner.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

…statement cleanup

…ng the preference of the spec decoder in multiple places. we need to move this to a completely engine-level parameter to fix.

…e / pr cleanup

…_hidden_states`

… prior to testing on H100

kyle-pena-kuzco · 2025-06-02T19:22:03Z

Hi @kyle-pena-kuzco can you merge the latest main

Done!

zhyncs · 2025-06-02T19:24:08Z

Hi @kyle-pena-kuzco can you merge the latest main

Done!

@kyle-pena-kuzco Nice. @CatherineSue will help review.

CatherineSue

Overall changes in openai_api and tests LGTM. Left some comments and questions for better clarification.
May need another pass on the speculative side changes.

python/sglang/srt/managers/io_struct.py

python/sglang/srt/openai_api/adapter.py

test/srt/test_openai_server_hidden_states.py

python/sglang/srt/speculative/eagle_worker.py

kyle-pena-kuzco · 2025-06-03T15:13:33Z

Overall changes in openai_api and tests LGTM. Left some comments and questions for better clarification. May need another pass on the speculative side changes.

Thank you @CatherineSue - I am addressing your feedback now!

Agree with this simplification. Co-authored-by: Chang Su <csu272@usc.edu>

… into openai-hidden-states

kyle-pena-kuzco · 2025-06-03T19:28:56Z

Overall changes in openai_api and tests LGTM. Left some comments and questions for better clarification. May need another pass on the speculative side changes.

Thank you @CatherineSue - I am addressing your feedback now!

@CatherineSue - I think I have addressed your feedback with my latest commits. Please let me know if there is anything else I can do or change. I really appreciate your consideration.

kyle-pena-kuzco · 2025-06-04T17:50:02Z

Overall changes in openai_api and tests LGTM. Left some comments and questions for better clarification. May need another pass on the speculative side changes.

Thank you @CatherineSue - I am addressing your feedback now!

@CatherineSue - I think I have addressed your feedback with my latest commits. Please let me know if there is anything else I can do or change. I really appreciate your consideration.

Hi @CatherineSue - let me know if you have any questions! I think i've addressed all of the feedback.

CatherineSue

@kyle-pena-kuzco Thanks for resolving all the comments. LGTM. If all CI passes I think we are good to go

kyle-pena-kuzco · 2025-06-04T21:33:35Z

@kyle-pena-kuzco Thanks for resolving all the comments. LGTM. If all CI passes I think we are good to go

@CatherineSue - it looks like there are merge conflicts now. i'll fix these today.

kyle-pena-kuzco · 2025-06-05T19:42:57Z

@kyle-pena-kuzco Thanks for resolving all the comments. LGTM. If all CI passes I think we are good to go

@CatherineSue - it looks like there are merge conflicts now. i'll fix these today.

hi @CatherineSue - let me know what you think of the run results. It's difficult for me to tell if these are genuine regressions because I see other commits / PRs with similar failures.

kyle-pena-kuzco · 2025-06-06T16:05:12Z

Hi @CatherineSue - is there anything else we should do in order for this PR to be merged? I see that gemini might be blocking the merge.

gemini-code-assist · 2025-06-06T16:05:27Z

Hi @kyle-pena-kuzco, thanks for the update.

Regarding your question about gemini potentially blocking the merge: I haven't submitted a formal review on this pull request yet. Therefore, I am not currently setting a review status (COMMENT or REQUEST_CHANGES) that would block the merge.

My role is to provide feedback and a review status based on the code changes when a review is requested or triggered.

If you would like me to perform a review of the latest changes to provide feedback, please let me know. Otherwise, you might want to check the status of other required checks or reviews listed on the pull request page to see what might be pending.

CatherineSue · 2025-06-06T21:36:52Z

cc @zhyncs The CI looks good. Should we merge?

kyle-pena-kuzco · 2025-06-09T22:19:51Z

cc @zhyncs The CI looks good. Should we merge?

@CatherineSue - as an FYI, I have the merge conflict resolved locally. i'm waiting to push it up until @zhyncs has had a chance to review, so that the CI doesn't re-run.

kyle-pena-kuzco · 2025-06-10T18:13:38Z

cc @zhyncs The CI looks good. Should we merge?

@CatherineSue - as an FYI, I have the merge conflict resolved locally. i'm waiting to push it up until @zhyncs has had a chance to review, so that the CI doesn't re-run.

@BBuf I've resolved another conflict and pushed. Let me know if any CI failures are something that I should look into. It's difficult to tell which CI failures are genuine.

zzh-www · 2025-06-12T09:26:01Z

@kyle-pena-kuzco Good job!!! But why it would not return the hidden_state of last token? "return_hidden_states" for me means return all hidden_states.

kyle-pena-kuzco · 2025-06-12T17:27:02Z

@kyle-pena-kuzco Good job!!! But why it would not return the hidden_state of last token? "return_hidden_states" for me means return all hidden_states.

If you are interacting the API, it's not practical to return all hidden states because this would involve serializing something like 30MB of raw float data for a typical model / response length.

You can still get all tokens if you interact with the engine directly (instead of through the API).

Kyle Pena and others added 30 commits May 8, 2025 15:38

initial implementation - no tests

6045393

removed turtle import

a936a21

added test coverage

4c0351a

implemented /v1/completions hidden states

f370828

small update

76a5826

fixed the test_completion and test_chat_completion tests

321e0c9

including adapter.py - mistakenly ommitted from last commit

2e1bad3

checkpoint before adding more detailed assert messages

dfddfba

all tests pass relating to returning hidden states pass now

fd70c53

fixed a few remaining issues

08213ce

removed debugging print statement

d51ab85

hidden states now excluded from api response if None, instead of null

f60cc26

removed mistaken way to write out to file

684e0ab

Merge remote-tracking branch 'upstream/main' into openai-hidden-states

1d651e2

last token only, as was intended

2a3e24a

Merge remote-tracking branch 'origin/main' into openai-hidden-states

f48c54c

WIP - needs test suite for spec/non-spec/latency impact, needs debug …

f70053a

…statement cleanup

found a fix for eagle decoding! get_model_worker_batch() was overridi…

6bb527d

…ng the preference of the spec decoder in multiple places. we need to move this to a completely engine-level parameter to fix.

checkpoint before implementing hack-free capture hidden state overrid…

d459c1d

…e / pr cleanup

commit just prior to cleaning up test suite

f8f9af8

removed extra loop in test_openai_server dedicated to testing `return…

cf52f1a

…_hidden_states`

significant cleanup continues

ecdcee0

cleanup continues

812d347

more cleanup

f081fad

test suite cleanup

0d47f14

test suite cleanup

abecafc

fixed a few outstanding problems revealed by one last regression test…

5ad7b4e

… prior to testing on H100

fixed the n > 1 problem in adapter.py for hidden states

d2ee5bb

re-enabled several test suites for hidden states

0fe6da3

small changes to test suite

d1ed0d1

Merge branch 'main' into openai-hidden-states

5c24867

CatherineSue reviewed Jun 3, 2025

View reviewed changes

kyle-pena-kuzco and others added 3 commits June 3, 2025 11:51

Update python/sglang/srt/openai_api/adapter.py

7a5bb50

Agree with this simplification. Co-authored-by: Chang Su <csu272@usc.edu>

addressed requested fixes with review by sglang team

f6bc35b

Merge branch 'openai-hidden-states' of github.com:context-labs/sglang…

01b47a4

… into openai-hidden-states

CatherineSue approved these changes Jun 4, 2025

View reviewed changes

Merge branch 'main' into openai-hidden-states

fbe7f7e

Merge branch 'main' into openai-hidden-states

e127013

Merge branch 'main' into openai-hidden-states

a562dc2

zhyncs assigned zhyncs and BBuf Jun 10, 2025

Merge branch 'main' into openai-hidden-states

bdfa5e6

zhyncs merged commit b56de8f into sgl-project:main Jun 10, 2025
0 of 48 checks passed

almaslof pushed a commit to mpashkovskii/sglang that referenced this pull request Jun 11, 2025

Open AI API hidden states (sgl-project#6716)

d472996

jianan-gu pushed a commit to jianan-gu/sglang that referenced this pull request Jun 12, 2025

Open AI API hidden states (sgl-project#6716)

d0e666e

Open AI API hidden states #6716

Open AI API hidden states #6716

Uh oh!

Conversation

kyle-pena-kuzco commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

kyle-pena-kuzco commented Jun 2, 2025

Uh oh!

zhyncs commented Jun 2, 2025

Uh oh!

CatherineSue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kyle-pena-kuzco commented Jun 3, 2025

Uh oh!

kyle-pena-kuzco commented Jun 3, 2025

Uh oh!

kyle-pena-kuzco commented Jun 4, 2025

Uh oh!

CatherineSue left a comment

Choose a reason for hiding this comment

Uh oh!

kyle-pena-kuzco commented Jun 4, 2025

Uh oh!

kyle-pena-kuzco commented Jun 5, 2025

Uh oh!

kyle-pena-kuzco commented Jun 6, 2025

Uh oh!

gemini-code-assist bot commented Jun 6, 2025

Uh oh!

CatherineSue commented Jun 6, 2025

Uh oh!

kyle-pena-kuzco commented Jun 9, 2025

Uh oh!

kyle-pena-kuzco commented Jun 10, 2025

Uh oh!

Uh oh!

zzh-www commented Jun 12, 2025

Uh oh!

kyle-pena-kuzco commented Jun 12, 2025

Uh oh!

Uh oh!

kyle-pena-kuzco commented May 28, 2025 •

edited

Loading