Skip to content

Conversation

ByronHsu
Copy link
Collaborator

Motivation

This series of PRs attempt to decouple the model code from vLLM dependencies. There are mainly four components we use:

from vllm.config import CacheConfig
from vllm.distributed import get_tensor_model_parallel_world_size
from vllm.model_executor.layers.rotary_embedding import get_rope
from vllm.model_executor.layers.vocab_parallel_embedding import (
   ParallelLMHead,
   VocabParallelEmbedding,
)

This PR removed the first one CacheConfig. This is the easiest one because radix attention always set page size as 1 so we don't need to set cache config.

Modifications

Remove from vllm.config import CacheConfig in all models

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@ByronHsu ByronHsu changed the title [de-vLLM 1/N] Remove CacheConfig import in all model files [WIP] [de-vLLM 1/N] Remove CacheConfig import in all model files Oct 13, 2024
@ByronHsu ByronHsu changed the title [WIP] [de-vLLM 1/N] Remove CacheConfig import in all model files [WIP] [1/N] Remove CacheConfig import in all model files Oct 13, 2024
@ByronHsu ByronHsu changed the title [WIP] [1/N] Remove CacheConfig import in all model files [1/N] Remove CacheConfig import in all model files Oct 13, 2024
@ByronHsu ByronHsu requested a review from Ying1123 October 13, 2024 22:39
@zhyncs zhyncs merged commit 56503d9 into sgl-project:main Oct 14, 2024
1 of 10 checks passed
@zhyncs zhyncs mentioned this pull request Oct 17, 2024
5 tasks
timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants