-
Notifications
You must be signed in to change notification settings - Fork 25k
Closed
Labels
high prioritymodule: binariesAnything related to official binaries that we release to usersAnything related to official binaries that we release to usersmodule: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'ttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Description
🐛 Describe the bug
torch 2.5.0 stable from pip with cuda 12.4 results in a reproducible broken install when attempting to follow 'Getting Started' guide:
docker run -it --rm --gpus=all almalinux/9-base
[root@a8af28733c07 /]# python3 -V
Python 3.9.18
[root@a8af28733c07 /]# python3 -m pip install torch torchvision torchaudio
[root@a8af28733c07 /]# python3
>>> import torch
Traceback (most recent call last):
File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 300, in _load_global_deps
ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/lib64/python3.9/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcudart.so.12: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 367, in <module>
_load_global_deps()
File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 325, in _load_global_deps
_preload_cuda_deps(lib_folder, lib_name)
File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 284, in _preload_cuda_deps
raise ValueError(f"{lib_name} not found in the system path {sys.path}")
ValueError: libcufile.so.*[0-9] not found in the system path ['', '/usr/lib64/python39.zip', '/usr/lib64/python3.9', '/usr/lib64/python3.9/lib-dynload', '/usr/local/lib64/python3.9/site-packages', '/usr/local/lib/python3.9/site-packages', '/usr/lib64/python3.9/site-packages', '/usr/lib/python3.9/site-packages']
This works fine for the previous version; eg 2.4.1, 2.4.0, etc:
python3 -m pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124
I notice for previous versions when installing, resulting torch version is 2.4.1+cu124
, whereas current stable install instructions result in 2.5.0
without +cu124
- is this a simple documentation issue?
Versions
torch-2.5.0-cp39-cp39-manylinux1_x86_64.whl from pypi
Diagnostic script relies on broken distribution of torch:
[root@a8af28733c07 /]# wget https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py
[root@a8af28733c07 /]# python3 collect_env.py
Traceback (most recent call last):
File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 300, in _load_global_deps
ctypes.CDLL(global_deps_lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/lib64/python3.9/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcudart.so.12: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "//collect_env.py", line 17, in <module>
import torch
File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 367, in <module>
_load_global_deps()
File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 325, in _load_global_deps
_preload_cuda_deps(lib_folder, lib_name)
File "/usr/local/lib64/python3.9/site-packages/torch/__init__.py", line 284, in _preload_cuda_deps
raise ValueError(f"{lib_name} not found in the system path {sys.path}")
ValueError: libcufile.so.*[0-9] not found in the system path ['/', '/usr/lib64/python39.zip', '/usr/lib64/python3.9', '/usr/lib64/python3.9/lib-dynload', '/usr/local/lib64/python3.9/site-packages', '/usr/local/lib/python3.9/site-packages', '/usr/lib64/python3.9/site-packages', '/usr/lib/python3.9/site-packages']
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @malfet @osalpekar @atalman @ptrblck
Metadata
Metadata
Labels
high prioritymodule: binariesAnything related to official binaries that we release to usersAnything related to official binaries that we release to usersmodule: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'ttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module