Skip to content

Conversation

tonyhoo
Copy link
Contributor

@tonyhoo tonyhoo commented Nov 6, 2023

Issue #, if available:

Description of changes:
Increase the upper bound of torch and lightning to accept 2.1 version

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

"torch": ">=2.0,<2.1", # "<{N+1}" upper cap, sync with common/src/autogluon/common/utils/try_import.py
"lightning": ">=2.0.0,<2.1", # "<{N+1}" upper cap
"pytorch_lightning": ">=2.0.0,<2.1", # "<{N+1}" upper cap, capping `lightning` does not cap `pytorch_lightning`!
"torch": ">=2.0,<2.2", # "<{N+1}" upper cap, sync with common/src/autogluon/common/utils/try_import.py
Copy link
Collaborator

@shchur shchur Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, the problem boils down to the following: Running pip install torch==2.0.0 installs PyTorch compiled with CUDA 11.7 (see here). In contrast, running pip install torch==2.1.0 or just pip install torch (as of Nov 7, 2023) installs PyTorch compiled with CUDA 12. Since the environment that we use to run tests comes with CUDA 11.8 installed, it cannot be used to run PyTorch that was compiled with CUDA12.

I don't think it's related to any specific code that we have in AutoGluon.

To fix this problem, we would need to ensure that during tests we install the correct PyTorch version with something like

pip install torch~=2.1.0 --index-url https://download.pytorch.org/whl/cu118

rather than

pip install torch~=2.1.0

that we currently use.

Some other things that I noticed:

  1. During installation of multimodal, actually torch-2.0.1 is installed, even though we increase the cap in this PR. Potentially, this happens because one of the multimodal dependencies caps torch <2.1. Therefore, the multimodal tests pass.
  2. I tried creating a fresh Python 3.10 environment on a p3 instance (with V100 GPU) and installing PyTorch 2.1 in it. Trying to use CUDA results in the same error as shown in timeseries logs. To reproduce:
    conda env create -n cuda12 python=3.10
    conda activate cuda12
    pip install torch~=2.1.0
    python -c "import torch; torch.zeros(1).cuda()"
    
    Output:
    RuntimeError: The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver
    

@tonyhoo tonyhoo reopened this Nov 14, 2023
@Innixma Innixma added this to the 1.1 Release milestone Nov 21, 2023
@ddelange
Copy link
Contributor

from a quick look at the available wheels, all should be fine as long as the end user requests the correct cuda version in the (extra) index url.

This will work fine:

# cu118 has wheels for 2.0.0 through 2.1.2
pip install autogluon.multimodal --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu118
# cu121 has wheels for 2.1.0 through 2.1.2
pip install autogluon.multimodal --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu121

Apart from maybe a docs update, are there any other blockers here?

Asking because this PR blocks support for latest tensorrt (which completes autogluon py3.11 support).

@ddelange
Copy link
Contributor

cc @Innixma @AnirudhDagar

@ddelange ddelange mentioned this pull request Jan 20, 2024
7 tasks
@Innixma Innixma added dependency Related to dependency packages priority: 0 Maximum priority labels Jan 21, 2024
@Innixma
Copy link
Contributor

Innixma commented Jan 21, 2024

@tonyhoo Any updates on the status of this PR?

@ddelange
Copy link
Contributor

superceded by #3982

@AnirudhDagar
Copy link
Contributor

AnirudhDagar commented Mar 19, 2024

thanks for the notif @ddelange , closed by #3982

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependency Related to dependency packages priority: 0 Maximum priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AutoMM] GroundingDino used in AutoMM doesn't support CUDA 12 and Torch 2.1
5 participants