Skip to content

Conversation

tiran
Copy link
Contributor

@tiran tiran commented Apr 25, 2024

Changes

Which issue is resolved by this Pull Request:
Resolves #899
Resolves #932

Description of your changes:

Modify the Containerfile for NVIDIA CUDA to:

  • use multi-stage build with CUDA runtime image for the final stage
  • update to CUDA 12.4.1
  • install the Python packages in a virtualenv that is copied in the final container
  • mimic the layout and behavior of ubi9/python-311 container
  • make the final container rootless
  • install flash-attn and bitesandbytes

Propose a shell alias that should simplify the user experience when using the container image, avoiding using long podman run commands.

Modify the containerization documentation to explain this and provide steps for the user.

Update the cuda target in the Makefile to reflect the changes above.

The changeset is co-authored by Fabien Dupont and is built on top of previous work by Charlie Doern and Fabien Dupont.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 25, 2024
@tiran tiran changed the title Refactor cuda Refactor CUDA Container file Apr 25, 2024
@tiran tiran force-pushed the refactor-cuda branch 2 times, most recently from 33599f0 to eea9eab Compare April 25, 2024 10:08
@tiran tiran marked this pull request as ready for review April 25, 2024 10:08
Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice cleanup! One thing I am unsure of is the new flow in which you podman run for each command you want to run. I would be ok with this as long as there was another build target that installed ilab but the entrypoint was just bin/bash

Copy link
Contributor

@luis5tb luis5tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the couple of comments, this works well

@stefwalter
Copy link
Contributor

@tiran FYI, I also needed this package nvidia-driver-cuda-libs ... not sure if that's accounted for here.

@tiran
Copy link
Contributor Author

tiran commented Apr 26, 2024

@stefwalter In the final image or on the host system?

@stefwalter
Copy link
Contributor

@stefwalter In the final image or on the host system?

In the image. Without nvidia-driver-cuda-libs

[root@8a706bc26838 instructlab]# podman3.11
bash: podman3.11: command not found
[root@8a706bc26838 instructlab]# python3.11
Python 3.11.5 (main, Sep  7 2023, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
torc>>> torch.cuda.device_count()
1
>>> torch.cuda.is_available()
False

With the library installed then is_available() returns True.

@tiran tiran force-pushed the refactor-cuda branch 2 times, most recently from f68ed6a to 6498f78 Compare April 28, 2024 12:35
@github-actions github-actions bot added the testing Relates to testing label Apr 28, 2024
@tiran
Copy link
Contributor Author

tiran commented Apr 28, 2024

I have simplified the CUDA container file even more. It's no longer using rootless. While rootless is more secure, it's harder to use, too. Let's keep it simple for now and improve it later. You can fetch a ready-to-use image from quay.io/tiran/instructlab-containers:cuda-ubi9.

@mergify mergify bot added CI/CD Affects CI/CD configuration container Affects containization aspects labels Apr 29, 2024
@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label May 2, 2024
Copy link
Contributor

mergify bot commented May 2, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tiran added 2 commits May 6, 2024 22:29
Signed-off-by: Christian Heimes <cheimes@redhat.com>
Signed-off-by: Christian Heimes <cheimes@redhat.com>
@tiran tiran force-pushed the refactor-cuda branch from 03daf78 to ad3e1c6 Compare May 6, 2024 20:29
@mergify mergify bot added needs-rebase This Pull Request needs to be rebased and removed needs-rebase This Pull Request needs to be rebased labels May 6, 2024
Copy link
Contributor

mergify bot commented May 7, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot removed the needs-rebase This Pull Request needs to be rebased label May 7, 2024
dhellmann added a commit to dhellmann/rebuilding-the-wheel that referenced this pull request May 15, 2024
@cdoern cdoern added the hold In-progress PR. Tag should be removed before merge. label May 16, 2024
@cdoern
Copy link
Contributor

cdoern commented May 16, 2024

@tiran does this still only have the capability to only run 1 cmd per container run? if so, I would really vote to add back the old workflow of exec'ing into a running ctr with ilab installed.

I may have lost context here so feel free to correct me :)

@mergify mergify bot added the needs-rebase This Pull Request needs to be rebased label May 16, 2024
Copy link
Contributor

mergify bot commented May 16, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. @tiran please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@nathan-weinberg
Copy link
Member

@tiran any update on this?

@nathan-weinberg nathan-weinberg removed this from the Release - 5/30 milestone May 29, 2024
@tiran
Copy link
Contributor Author

tiran commented Jul 9, 2024

Fabien is taking care of CUDA container. Closing this stale PR.

@tiran tiran closed this Jul 9, 2024
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Aug 2, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Aug 2, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Aug 2, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Aug 5, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Aug 6, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Aug 7, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Aug 7, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Aug 19, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Sep 17, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Sep 18, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Sep 18, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
fabiendupont added a commit to fabiendupont/instructlab that referenced this pull request Sep 19, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves instructlab#899
Replaces instructlab#992
Replaces instructlab#994

Signed-off-by: Fabien Dupont <fdupont@redhat.com>
mergify bot added a commit that referenced this pull request Sep 19, 2024
This change modifies the Containerfile for NVIDIA CUDA to:
- Use CentOS Stream 9 as the default base image. UBI9 also works
- Install the NVIDIA packages from NVIDIA repository for exact CUDA version
- Add NVIDIA scripts from `nvcr.io/nvidia/cuda` image for copyrights...
- Install the Python packages in a Python 3.11 virtualenv
- Install instructlab package from PyPI
- Allow specifying the instructlab version at build time
- Make the final container rootless

Resolves #899
Replaces #932
Replaces #994


Approved-by: nathan-weinberg

Approved-by: markstur
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration container Affects containization aspects documentation Improvements or additions to documentation hold In-progress PR. Tag should be removed before merge. needs-rebase This Pull Request needs to be rebased testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Container ships build dependencies
8 participants