Skip to content

[Bug][sleep] Create engine, sleep, wake up, generate --> gibberish #7939

@CharlieFRuan

Description

@CharlieFRuan

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

As described in the title. I am wondering if users are required to do an explicit weight update before generating. Is the following expected behavior? Thank you!

Reproduction

import torch
import sglang as sgl

def main_sglang():
    server_args = {
        "model_path": "Qwen/Qwen3-0.6B",
        "disable_cuda_graph": True,  # does not affect behavior
        "tp_size": 1,
        "enable_memory_saver": True,
    }
    llm = sgl.Engine(**server_args)

    print(f"Free GPU memory before sleep: {torch.cuda.mem_get_info()[0] / 1024**2:.1f} MB")
    llm.release_memory_occupation()
    print(f"Free GPU memory after sleep: {torch.cuda.mem_get_info()[0] / 1024**2:.1f} MB")
    llm.resume_memory_occupation()
    print(f"Free GPU memory after wake up: {torch.cuda.mem_get_info()[0] / 1024**2:.1f} MB")

    outputs = llm.generate("Hello, my name is", {"max_new_tokens": 10})
    print(outputs['text'])

if __name__ == "__main__":
    main_sglang()

Output:

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.48it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  4.48it/s]

Free GPU memory before sleep: 8338.2 MB
Free GPU memory after sleep: 38830.2 MB
Free GPU memory after wake up: 8318.2 MB
هذهn屋쨌养老保险 admins Aerospace.lex العسكري والتي

Environment

Environment:

  • A100 40GB
  • Tried SGLang build-from-source at head (766392c), pip-installed 0.4.9.post1, 0.4.8.post1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions