Skip to content

server base: sharding_manager.__enter__() function should not be called directly in wake_up method #1734

@chenhaiq

Description

@chenhaiq

This code needs to be refactored. Specifically, both the vLLM and SGLang branches require changes.
enter function should not be called directly.

    def wake_up(self, *args, **kwargs):
        """Load model weights and build kv cache."""
        if not self.is_sleep:
            return
        self.sharding_manager.__enter__()  # pylint: disable=C2801
        self.is_sleep = False

cc @wuxibin89

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions