Skip to content

TRL vllm-serve fails to load certain models #3142

@zaddy6

Description

@zaddy6

Reproduction

trl vllm-serve --model microsoft/phi-4

outputs:

EngineCore hit an exception: Traceback (most recent call last):
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 335, in run_engine_core
ERROR 03-23 11:27:36 [core.py:343]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 03-23 11:27:36 [core.py:343]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 290, in __init__
ERROR 03-23 11:27:36 [core.py:343]     super().__init__(vllm_config, executor_class, log_stats)
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 60, in __init__
ERROR 03-23 11:27:36 [core.py:343]     self.model_executor = executor_class(vllm_config)
ERROR 03-23 11:27:36 [core.py:343]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 03-23 11:27:36 [core.py:343]     self._init_executor()
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 45, in _init_executor
ERROR 03-23 11:27:36 [core.py:343]     self.collective_rpc("init_worker", args=([kwargs], ))
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 03-23 11:27:36 [core.py:343]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 03-23 11:27:36 [core.py:343]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/utils.py", line 2255, in run_method
ERROR 03-23 11:27:36 [core.py:343]     return func(*args, **kwargs)
ERROR 03-23 11:27:36 [core.py:343]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 594, in init_worker
ERROR 03-23 11:27:36 [core.py:343]     self.worker = worker_class(**kwargs)
ERROR 03-23 11:27:36 [core.py:343]                   ^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/trl/scripts/vllm_serve.py", line 72, in __init__
ERROR 03-23 11:27:36 [core.py:343]     super().__init__(*args, **kwargs)
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/worker/worker.py", line 82, in __init__
ERROR 03-23 11:27:36 [core.py:343]     self.model_runner: GPUModelRunnerBase = ModelRunnerClass(
ERROR 03-23 11:27:36 [core.py:343]                                             ^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1071, in __init__
ERROR 03-23 11:27:36 [core.py:343]     self.attn_state = self.attn_backend.get_state_cls()(
ERROR 03-23 11:27:36 [core.py:343]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-23 11:27:36 [core.py:343]   File "/opt/conda/envs/unsloth_env/lib/python3.11/site-packages/vllm/attention/backends/abstract.py", line 59, in get_state_cls
ERROR 03-23 11:27:36 [core.py:343]     raise NotImplementedError
ERROR 03-23 11:27:36 [core.py:343] NotImplementedError
    ...

System Info

  • Platform: Linux-5.15.0-130-generic-x86_64-with-glibc2.35
  • Python version: 3.11.11
  • TRL version: 0.16.0.dev0
  • PyTorch version: 2.6.0
  • CUDA device(s): NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3
  • Transformers version: 4.50.0
  • Accelerate version: 1.5.2
  • Accelerate config: not found
  • Datasets version: 3.4.1
  • HF Hub version: 0.29.3
  • bitsandbytes version: 0.45.3
  • DeepSpeed version: 0.16.4
  • Diffusers version: 0.32.2
  • Liger-Kernel version: 0.5.5
  • LLM-Blender version: not installed
  • OpenAI version: 1.66.3
  • PEFT version: 0.14.0
  • vLLM version: 0.8.2.dev77+gf90d34b4

Checklist

  • I have checked that my issue isn't already filed (see open issues)
  • I have included my system information
  • Any code provided is minimal, complete, and reproducible (more on MREs)
  • Any code provided is properly formatted in code blocks, (no screenshot, more on code blocks)
  • Any traceback provided is complete

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐛 bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions