[Feature] Make vLLM optional in model code

### UPDATE(11/23/2024)

Currently, @james-p-xu  is removing rope, @yizhang2077  is removing distributed, @HandH1998 is removing weight loader. Optimistically, we can remove these dependencies by the end of the month and make quant optional (try import). cc @merrymercy @Ying1123 

### Motivation

This is a tracker of removing vLLM dependencies in general model code (not considering quantization). This is our current  import from vLLM, and we want to remove all them.

```python
from vllm.config import CacheConfig
from vllm.distributed import get_tensor_model_parallel_world_size
from vllm.model_executor.layers.rotary_embedding import get_rope
from vllm.model_executor.layers.vocab_parallel_embedding import (
   ParallelLMHead,
   VocabParallelEmbedding,
)
```

### Tracker

- [x] Remove `CacheConfig`: https://github.com/sgl-project/sglang/pull/1658
- [x] Remove RoPE: https://github.com/flashinfer-ai/flashinfer/issues/530
- [x] Remove `get_tensor_model_parallel_world_size`
- [x] Remove `ParallelLMHead`: https://github.com/sgl-project/sglang/pull/1856
- [x] Remove `VocabParallelEmbedding`:  https://github.com/sgl-project/sglang/pull/1856



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Make vLLM optional in model code #1673

UPDATE(11/23/2024)

Motivation

Tracker

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Make vLLM optional in model code #1673

Description

UPDATE(11/23/2024)

Motivation

Tracker

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions