Skip to content

[Bug] flashinfer_python with minimum required version 0.2.5 is not installed #6160

@msharara1998

Description

@msharara1998

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

I am trying to serve gemma3 27b-it on RTX 5090 using sglang blackwell image. However, I'm getting this error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/importlib/metadata/__init__.py", line 563, in from_name
    return next(cls.discover(name=name))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 684, in assert_pkg_version
    installed_version = version(pkg)
                        ^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/importlib/metadata/__init__.py", line 1008, in version
    return distribution(distribution_name).version
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/importlib/metadata/__init__.py", line 981, in distribution
    return Distribution.from_name(distribution_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/importlib/metadata/__init__.py", line 565, in from_name
    raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for flashinfer_python

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 14, in <module>
    launch_server(server_args)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/http_server.py", line 726, in launch_server
    tokenizer_manager, scheduler_info = _launch_subprocesses(server_args=server_args)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 513, in _launch_subprocesses
    _set_envs_and_config(server_args)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 464, in _set_envs_and_config
    assert_pkg_version(
  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 691, in assert_pkg_version
    raise Exception(
Exception: flashinfer_python with minimum required version 0.2.5 is not installed. Please uninstall the old version and reinstall the latest version by following the instructions at https://docs.flashinfer.ai/installation.html.

Reproduction

The model is retrieved from huggingface
Here is the docker compose to run it:

generation_gemma_3_27b_sglang:
    image: lmsysorg/sglang:blackwell
    container_name: generation-gemma-3-27b-sglang
    volumes:
      - ./models/google--gemma-3-27b-it:/models/google--gemma-3-27b-it
      - ./models/torchinductor_cache:/models/torchinductor_cache
    # restart: always
    network_mode: host # required by RDMA
    privileged: true # required by RDMA
    # Or you can only publish port 30000
    # ports:
    #   - 30000:30000
    environment:
      - TORCHINDUCTOR_CACHE_DIR=/models/torchinductor_cache
    entrypoint: python3 -m sglang.launch_server
    command: --model-path /models/google--gemma-3-27b-it
      --host 0.0.0.0
      --context-length 8192
      --port 30000
      --random-seed 0
      --log-requests-level 2
      --enable-metrics
      --max-running-requests 4
      --show-time-cost
      --dtype float16
      --stream-interval 2
      --served-model-name "gemma-3-27b"
      --tp 4
      --attention-backend flashinfer
      # --enable-torch-compile
      # --tokenizer-mode auto
      # --enable-mixed-chunk
      # --chat-template /models/CohereForAI--aya-expanse-8b/chat_template.json
    ulimits:
      memlock: -1
      stack: 67108864
    ipc: host
    # healthcheck:
    #   test: ["CMD-SHELL", "curl -f http://localhost:30000/health || exit 1"]
    #   retries: 3
    #   interval: 1h
    #   timeout: 1m
    #   start_period: 2m
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [GPU]

Environment

python3 -m sglang.check_env
/home/ubuntu-user/miniconda3/envs/default/lib/python3.12/site-packages/torch/cuda/init.py:287: UserWarning:
NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(
Python: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:56:27) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA GeForce RTX 5090
GPU 0,1,2,3 Compute Capability: 12.0
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 12.2, V12.2.140
CUDA Driver Version: 570.144
PyTorch: 2.7.0+cu126
sglang: 0.4.6.post2
sgl_kernel: Module Not Found
flashinfer_python: Module Not Found
triton: 3.3.0
transformers: Module Not Found
torchao: Module Not Found
numpy: 2.2.5
aiohttp: 3.11.18
fastapi: 0.115.12
hf_transfer: Module Not Found
huggingface_hub: 0.31.1
interegular: Module Not Found
modelscope: Module Not Found
orjson: Module Not Found
outlines: Module Not Found
packaging: 25.0
psutil: 7.0.0
pydantic: 2.11.4
python-multipart: Module Not Found
pyzmq: 26.4.0
uvicorn: Module Not Found
uvloop: Module Not Found
vllm: Module Not Found
xgrammar: Module Not Found
openai: Module Not Found
tiktoken: Module Not Found
anthropic: Module Not Found
litellm: Module Not Found
decord: Module Not Found
NVIDIA Topology:
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE SYS SYS 0-7,16-23 0 N/A
GPU1 NODE X SYS SYS 0-7,16-23 0 N/A
GPU2 SYS SYS X NODE 8-15,24-31 1 N/A
GPU3 SYS SYS NODE X 8-15,24-31 1 N/A

Legend:

X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks

ulimit soft: 1073741816

(Please note that I ran this in a conda environment on my machine because I'm using a docker container where I'm getting the error, and the docker container is exiting so I can't run inside it)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions