-
Notifications
You must be signed in to change notification settings - Fork 440
containers: cuda contianer needs libcudann8 and nvidia-driver-NVML #1018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is needed by #1016 |
d91b651
to
3f0f299
Compare
Tested this on a G5 AWS instance. And it does the trick. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to let mergify do the merge on this one -- please hold off on merging
(tested locally and this builds)
@Mergifyio rebase |
The libcudann8 container needs to be installed in the container or else we see errors like this one: File "/usr/local/lib64/python3.11/site-packages/torch/__init__.py", line 237, in <module> from torch._C import * # noqa: F403 ^^^^^^^^^^^^^^^^^^^^^^ ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory And in order to find devices in torch: $ python3.11 >>> import torch >>> torch.cuda.device_count() 1 The above returns zero without nvidia-driver-NVML Signed-off-by: Stef Walter <stefw@redhat.com>
✅ Branch has been successfully rebased |
3f0f299
to
8704229
Compare
Groan, look like I forgot one depnedency:
|
This was called out by @stefwalter on instructlab#1018. He did that PR and commented after it merged that this is needed as well. Signed-off-by: Russell Bryant <rbryant@redhat.com>
@stefwalter posted in #1023 |
The libcudann8 container needs to be installed in the container or
else we see errors like this one:
And in order to find devices in torch:
The above returns zero without nvidia-driver-NVML