Skip to content

Conversation

kyle-pena-kuzco
Copy link
Contributor

@kyle-pena-kuzco kyle-pena-kuzco commented May 28, 2025

Motivation

This PR exposes hidden states through the OpenAI API.
Hidden states can now be returned by including return_hidden_states in the request body.
This feature is enabled by supplying the flag --enable-return-hidden-states.

The test failure noted in the last submission of this PR is resolved, as explained below.

This feature supports all varieties of OpenAI generation requests:

  • v1/chat/completions and v1/completions
  • streaming and non-streaming
  • Parallel sampling (n>1)

The previous implementation would re-compile the CUDA graph every time the user switched between return_hidden_states True and False at the request level.

This behavior is resolved - changing return_hidden_states will no longer trigger a CUDA graph re-capture.

Modifications

  • Adds an engine-level flag to enable returning returning hidden states through the engine, native API, or openai API.
  • When this engine-level flag is supplied, if CUDA graph is enabled, the CUDA graph runner is compiled in CaptureHiddenMode.FULL on startup (so that a double-capture isn't needed).
  • This behavior is completely backwards compatible with using the CUDA graph runner for the TARGET model in speculative decoding: TARGET_VERIFY runs with CaptureHiddenMode.FULL, and TARGET_EXTEND does not use the CUDA graph runner.

Checklist

Kyle Pena and others added 30 commits May 8, 2025 15:38
…ng the preference of the spec decoder in multiple places. we need to move this to a completely engine-level parameter to fix.
@kyle-pena-kuzco
Copy link
Contributor Author

Hi @kyle-pena-kuzco can you merge the latest main

Done!

@zhyncs
Copy link
Member

zhyncs commented Jun 2, 2025

Hi @kyle-pena-kuzco can you merge the latest main

Done!

@kyle-pena-kuzco Nice. @CatherineSue will help review.

Copy link
Collaborator

@CatherineSue CatherineSue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall changes in openai_api and tests LGTM. Left some comments and questions for better clarification.
May need another pass on the speculative side changes.

@kyle-pena-kuzco
Copy link
Contributor Author

Overall changes in openai_api and tests LGTM. Left some comments and questions for better clarification. May need another pass on the speculative side changes.

Thank you @CatherineSue - I am addressing your feedback now!

@kyle-pena-kuzco
Copy link
Contributor Author

Overall changes in openai_api and tests LGTM. Left some comments and questions for better clarification. May need another pass on the speculative side changes.

Thank you @CatherineSue - I am addressing your feedback now!

@CatherineSue - I think I have addressed your feedback with my latest commits. Please let me know if there is anything else I can do or change. I really appreciate your consideration.

@kyle-pena-kuzco
Copy link
Contributor Author

Overall changes in openai_api and tests LGTM. Left some comments and questions for better clarification. May need another pass on the speculative side changes.

Thank you @CatherineSue - I am addressing your feedback now!

@CatherineSue - I think I have addressed your feedback with my latest commits. Please let me know if there is anything else I can do or change. I really appreciate your consideration.

Hi @CatherineSue - let me know if you have any questions! I think i've addressed all of the feedback.

Copy link
Collaborator

@CatherineSue CatherineSue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyle-pena-kuzco Thanks for resolving all the comments. LGTM. If all CI passes I think we are good to go

@kyle-pena-kuzco
Copy link
Contributor Author

@kyle-pena-kuzco Thanks for resolving all the comments. LGTM. If all CI passes I think we are good to go

@CatherineSue - it looks like there are merge conflicts now. i'll fix these today.

@kyle-pena-kuzco
Copy link
Contributor Author

@kyle-pena-kuzco Thanks for resolving all the comments. LGTM. If all CI passes I think we are good to go

@CatherineSue - it looks like there are merge conflicts now. i'll fix these today.

hi @CatherineSue - let me know what you think of the run results. It's difficult for me to tell if these are genuine regressions because I see other commits / PRs with similar failures.

@kyle-pena-kuzco
Copy link
Contributor Author

Hi @CatherineSue - is there anything else we should do in order for this PR to be merged? I see that gemini might be blocking the merge.

Copy link
Contributor

Hi @kyle-pena-kuzco, thanks for the update.

Regarding your question about gemini potentially blocking the merge: I haven't submitted a formal review on this pull request yet. Therefore, I am not currently setting a review status (COMMENT or REQUEST_CHANGES) that would block the merge.

My role is to provide feedback and a review status based on the code changes when a review is requested or triggered.

If you would like me to perform a review of the latest changes to provide feedback, please let me know. Otherwise, you might want to check the status of other required checks or reviews listed on the pull request page to see what might be pending.

@CatherineSue
Copy link
Collaborator

cc @zhyncs The CI looks good. Should we merge?

@kyle-pena-kuzco
Copy link
Contributor Author

cc @zhyncs The CI looks good. Should we merge?

@CatherineSue - as an FYI, I have the merge conflict resolved locally. i'm waiting to push it up until @zhyncs has had a chance to review, so that the CI doesn't re-run.

@kyle-pena-kuzco
Copy link
Contributor Author

cc @zhyncs The CI looks good. Should we merge?

@CatherineSue - as an FYI, I have the merge conflict resolved locally. i'm waiting to push it up until @zhyncs has had a chance to review, so that the CI doesn't re-run.

@BBuf I've resolved another conflict and pushed. Let me know if any CI failures are something that I should look into. It's difficult to tell which CI failures are genuine.

@zhyncs zhyncs merged commit b56de8f into sgl-project:main Jun 10, 2025
0 of 48 checks passed
almaslof pushed a commit to mpashkovskii/sglang that referenced this pull request Jun 11, 2025
jianan-gu pushed a commit to jianan-gu/sglang that referenced this pull request Jun 12, 2025
@zzh-www
Copy link

zzh-www commented Jun 12, 2025

@kyle-pena-kuzco Good job!!! But why it would not return the hidden_state of last token? "return_hidden_states" for me means return all hidden_states.

@kyle-pena-kuzco
Copy link
Contributor Author

@kyle-pena-kuzco Good job!!! But why it would not return the hidden_state of last token? "return_hidden_states" for me means return all hidden_states.

If you are interacting the API, it's not practical to return all hidden states because this would involve serializing something like 30MB of raw float data for a typical model / response length.

You can still get all tokens if you interact with the engine directly (instead of through the API).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants