Skip to content

Pytorch 2.4 RC cu118 wheels do not work on old drivers #130684

@ppwwyyxx

Description

@ppwwyyxx

🐛 Describe the bug

Pytorch2.4 uses a new version of triton that adds the cuTensorMapEncodeTiled API (triton-lang/triton@7289a23#diff-0d645ca31937abba9a3357062ee2c3708f6d49f66d7842d5f6577a2044f962f5)

This API requires a sufficiently new NVIDIA driver. Otherwise triton refuses to compile anything. To reproduce:

Traceback (most recent call last):
  File "/users/XXXg/home/projects/triton/python/tutorials/06-fused-attention.py", line 81, in <module>
    configs = [
  File "/users/XXXg/home/projects/triton/python/tutorials/06-fused-attention.py", line 85, in <listcomp>
    for s in ([1] if is_hip() else [3, 4, 7])\
  File "/users/XXXg/home/projects/triton/python/tutorials/06-fused-attention.py", line 22, in is_hip
    return triton.runtime.driver.active.get_current_target().backend == "hip"
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in __getattr__
    self._initialize_obj()
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
    self._obj = self._init_fn()
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver
    return actives[0]()
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 371, in __init__
    self.utils = CudaUtils()  # TODO: make static
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in __init__
    mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
  File "/home/XXXg/.pyenv/versions/torch24/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 62, in compile_module_from_src
    mod = importlib.util.module_from_spec(spec)
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: /home/XXXg/.triton/cache/2920354f453efffb492e73b112abcee1d2d301a37ade21e318a1ba26fa4fcd7c/cuda_utils.so: undefined symbol: cuTensorMapEncodeTiled

My driver version is: NVIDIA-SMI 470.161.03 Driver Version: 470.161.03. Note that this driver had been running older pytorch cu118 wheels without problems.

Related issue: triton-lang/triton#2062

Versions

PyTorch version: 2.4.0+cu118
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.0 (default, Dec 18 2023, 03:34:21) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.250-2-velinux1u1-amd64-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY


Nvidia driver version: 470.161.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0

cc @ptrblck @msaroufim @ezyang @anijain2305 @chauhang @penguinwu @bertmaher @int3 @davidberard98 @nmacchioni @chenyang78 @embg @malfet @seemethere

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cudaRelated to torch.cuda, and CUDA support in generalmodule: third_partyoncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleupstream tritonUpstream Triton Issue

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions