-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Closed
Labels
Description
UPDATE(11/23/2024)
Currently, @james-p-xu is removing rope, @yizhang2077 is removing distributed, @HandH1998 is removing weight loader. Optimistically, we can remove these dependencies by the end of the month and make quant optional (try import). cc @merrymercy @Ying1123
Motivation
This is a tracker of removing vLLM dependencies in general model code (not considering quantization). This is our current import from vLLM, and we want to remove all them.
from vllm.config import CacheConfig
from vllm.distributed import get_tensor_model_parallel_world_size
from vllm.model_executor.layers.rotary_embedding import get_rope
from vllm.model_executor.layers.vocab_parallel_embedding import (
ParallelLMHead,
VocabParallelEmbedding,
)
Tracker
- Remove
CacheConfig
: [1/N] RemoveCacheConfig
import in all model files #1658 - Remove RoPE: Support vLLM-style rope flashinfer-ai/flashinfer#530
- Remove
get_tensor_model_parallel_world_size
- Remove
ParallelLMHead
: Update vocab embedding deps and add TP switch #1856 - Remove
VocabParallelEmbedding
: Update vocab embedding deps and add TP switch #1856
zhyncs, jerryzh168, austin362667, vkc1vk and dmarx