TRL vllm-serve fails to load certain models

### Reproduction

```python
trl vllm-serve --model microsoft/phi-4

```

outputs:

```
EngineCore hit an exception: Traceback (most recent call last):
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 335, in run_engine_core
ERROR 03-23 11:27:36 [core.py:343]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 03-23 11:27:36 [core.py:343]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 290, in __init__
ERROR 03-23 11:27:36 [core.py:343]     super().__init__(vllm_config, executor_class, log_stats)
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 60, in __init__
ERROR 03-23 11:27:36 [core.py:343]     self.model_executor = executor_class(vllm_config)
ERROR 03-23 11:27:36 [core.py:343]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 03-23 11:27:36 [core.py:343]     self._init_executor()
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 45, in _init_executor
ERROR 03-23 11:27:36 [core.py:343]     self.collective_rpc("init_worker", args=([kwargs], ))
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 03-23 11:27:36 [core.py:343]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 03-23 11:27:36 [core.py:343]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/utils.py", line 2255, in run_method
ERROR 03-23 11:27:36 [core.py:343]     return func(*args, **kwargs)
ERROR 03-23 11:27:36 [core.py:343]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 594, in init_worker
ERROR 03-23 11:27:36 [core.py:343]     self.worker = worker_class(**kwargs)
ERROR 03-23 11:27:36 [core.py:343]                   ^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/trl/scripts/vllm_serve.py", line 72, in __init__
ERROR 03-23 11:27:36 [core.py:343]     super().__init__(*args, **kwargs)
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/worker/worker.py", line 82, in __init__
ERROR 03-23 11:27:36 [core.py:343]     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
ERROR 03-23 11:27:36 [core.py:343]                                             ^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1071, in __init__
ERROR 03-23 11:27:36 [core.py:343]     self.attn_state = self.attn_backend.get_state_cls()(
ERROR 03-23 11:27:36 [core.py:343]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/attention/backends/abstract.py", line 59, in get_state_cls
ERROR 03-23 11:27:36 [core.py:343]     raise NotImplementedError
ERROR 03-23 11:27:36 [core.py:343] NotImplementedError
    ...
```


### System Info

- Platform: Linux-5.15.0-130-generic-x86_64-with-glibc2.35
- Python version: 3.11.11
- TRL version: 0.16.0.dev0
- PyTorch version: 2.6.0
- CUDA device(s): NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3
- Transformers version: 4.50.0
- Accelerate version: 1.5.2
- Accelerate config: not found
- Datasets version: 3.4.1
- HF Hub version: 0.29.3
- bitsandbytes version: 0.45.3
- DeepSpeed version: 0.16.4
- Diffusers version: 0.32.2
- Liger-Kernel version: 0.5.5
- LLM-Blender version: not installed
- OpenAI version: 1.66.3
- PEFT version: 0.14.0
- vLLM version: 0.8.2.dev77+gf90d34b4

### Checklist

- [x] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [x] I have included my system information
- [x] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [x] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TRL vllm-serve fails to load certain models #3142

Reproduction

System Info

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TRL vllm-serve fails to load certain models #3142

Description

Reproduction

System Info

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions