Skip to content

Conversation

HermitSun
Copy link
Contributor

@HermitSun HermitSun commented Apr 1, 2025

Motivation

Resolve #4822.

Modifications

Support loading safetensors weights with runai_streamer. This can be enabled by adding the option --load-format runai_streamer when launching.

Checklist

@@ -424,6 +424,7 @@ def add_cli_args(parser: argparse.ArgumentParser):
"bitsandbytes",
"layered",
"remote",
"runai_streamer",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to add a description for it below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder, I've already added it.
As for why I didn't add a comment for remote, it's because I think the logic of runai_streamer and remote can be merged. I'll try to refactor this logic a bit later.

@brayden-hai
Copy link

Hi @HermitSun I'm wondering if the existing SGLang already supports runai streamer, as I was able to install it but the performance was still not as good as expected. I'm interested in the S3 use case. Right now I am using the basic MP model loader, I wonder if you have compared this performance with the MP loader in #7277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Load model weight in parallel
3 participants