-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Closed
Labels
high prioritymodule: binariesAnything related to official binaries that we release to usersAnything related to official binaries that we release to userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Description
🐛 Describe the bug
- Allocate c5a.4xlarge instance, for example by running:
import boto3
ec2=boto3.resource("ec2")
rc=ec2.create_instances(ImageId="ami-031843d9eaa76ad7a",InstanceType="c5a.4xlarge",SecurityGroups=['ssh-allworld'],KeyName="nshulga-key",MinCount=1,MaxCount=1,BlockDeviceMappings=[{'DeviceName': '/dev/sda1','Ebs': {'DeleteOnTermination': True, 'VolumeSize': 150,'VolumeType': 'standard'}}])
- SSH into the instance and run
python3 -mpip install torch
- Run
python3 -c "import torch"
Above fails with:
$ python3 -c "import torch"
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/__init__.py", line 172, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/ubuntu/.local/lib/python3.8/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11
Versions
1.13.1, 1.13.0
Metadata
Metadata
Labels
high prioritymodule: binariesAnything related to official binaries that we release to usersAnything related to official binaries that we release to userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module