[DRAFT] Increase the upper bound of torch and lightning to accept 2.1 version #3663

tonyhoo · 2023-11-06T18:52:01Z

Issue #, if available:

Description of changes:
Increase the upper bound of torch and lightning to accept 2.1 version

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

shchur · 2023-11-07T13:09:09Z

core/src/autogluon/core/_setup_utils.py

-    "torch": ">=2.0,<2.1",  # "<{N+1}" upper cap, sync with common/src/autogluon/common/utils/try_import.py
-    "lightning": ">=2.0.0,<2.1",  # "<{N+1}" upper cap
-    "pytorch_lightning": ">=2.0.0,<2.1",  # "<{N+1}" upper cap, capping `lightning` does not cap `pytorch_lightning`!
+    "torch": ">=2.0,<2.2",  # "<{N+1}" upper cap, sync with common/src/autogluon/common/utils/try_import.py


As far as I understand, the problem boils down to the following: Running pip install torch==2.0.0 installs PyTorch compiled with CUDA 11.7 (see here). In contrast, running pip install torch==2.1.0 or just pip install torch (as of Nov 7, 2023) installs PyTorch compiled with CUDA 12. Since the environment that we use to run tests comes with CUDA 11.8 installed, it cannot be used to run PyTorch that was compiled with CUDA12.

I don't think it's related to any specific code that we have in AutoGluon.

To fix this problem, we would need to ensure that during tests we install the correct PyTorch version with something like

pip install torch~=2.1.0 --index-url https://download.pytorch.org/whl/cu118

rather than

pip install torch~=2.1.0

that we currently use.

Some other things that I noticed:

During installation of multimodal, actually torch-2.0.1 is installed, even though we increase the cap in this PR. Potentially, this happens because one of the multimodal dependencies caps torch <2.1. Therefore, the multimodal tests pass.

I tried creating a fresh Python 3.10 environment on a p3 instance (with V100 GPU) and installing PyTorch 2.1 in it. Trying to use CUDA results in the same error as shown in timeseries logs. To reproduce:
conda env create -n cuda12 python=3.10 conda activate cuda12 pip install torch~=2.1.0 python -c "import torch; torch.zeros(1).cuda()"
Output:
RuntimeError: The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver

ddelange · 2023-12-23T09:31:36Z

from a quick look at the available wheels, all should be fine as long as the end user requests the correct cuda version in the (extra) index url.

This will work fine:

# cu118 has wheels for 2.0.0 through 2.1.2
pip install autogluon.multimodal --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu118
# cu121 has wheels for 2.1.0 through 2.1.2
pip install autogluon.multimodal --force-reinstall --extra-index-url https://download.pytorch.org/whl/cu121

Apart from maybe a docs update, are there any other blockers here?

Asking because this PR blocks support for latest tensorrt (which completes autogluon py3.11 support).

ddelange · 2023-12-23T09:34:16Z

cc @Innixma @AnirudhDagar

Innixma · 2024-01-21T21:39:46Z

@tonyhoo Any updates on the status of this PR?

ddelange · 2024-03-19T14:53:04Z

superceded by #3982

AnirudhDagar · 2024-03-19T15:00:11Z

thanks for the notif @ddelange , closed by #3982

shchur reviewed Nov 7, 2023

View reviewed changes

tonyhoo closed this Nov 14, 2023

tonyhoo force-pushed the torch_upgrade branch from 9d27b83 to c67c245 Compare November 14, 2023 17:27

upgrade torch

c60abce

tonyhoo reopened this Nov 14, 2023

update docker images

3eb55c6

tonyhoo linked an issue Nov 14, 2023 that may be closed by this pull request

[AutoMM] GroundingDino used in AutoMM doesn't support CUDA 12 and Torch 2.1 #3708

Closed

Merge branch 'autogluon:master' into torch_upgrade

9600da6

Innixma added this to the 1.1 Release milestone Nov 21, 2023

AnirudhDagar mentioned this pull request Dec 5, 2023

Add support for python 3.11 #3190

Merged

ddelange mentioned this pull request Jan 20, 2024

Support Python 3.11 #2687

Closed

7 tasks

Innixma added dependency Related to dependency packages priority: 0 Maximum priority labels Jan 21, 2024

This was referenced Mar 12, 2024

[Infra][Release v1.1] Update job definition to use latest images supporting PyTorch 2.1 #3973

Closed

[v.1.1][Upgrade] PyTorch 2.1 and CUDA 12.1 upgrade #3982

Merged

AnirudhDagar closed this Mar 19, 2024

tonyhoo deleted the torch_upgrade branch July 31, 2024 17:21

ddelange mentioned this pull request Aug 1, 2024

[multimodal] Check if tensorrt works in Python 3.11 #4130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] Increase the upper bound of torch and lightning to accept 2.1 version #3663

[DRAFT] Increase the upper bound of torch and lightning to accept 2.1 version #3663

Uh oh!

tonyhoo commented Nov 6, 2023

Uh oh!

shchur Nov 7, 2023 •

edited

Loading

Uh oh!

ddelange commented Dec 23, 2023

Uh oh!

ddelange commented Dec 23, 2023

Uh oh!

Innixma commented Jan 21, 2024

Uh oh!

ddelange commented Mar 19, 2024

Uh oh!

AnirudhDagar commented Mar 19, 2024 •

edited

Loading

Uh oh!

Uh oh!

[DRAFT] Increase the upper bound of torch and lightning to accept 2.1 version #3663

[DRAFT] Increase the upper bound of torch and lightning to accept 2.1 version #3663

Uh oh!

Conversation

tonyhoo commented Nov 6, 2023

Uh oh!

shchur Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ddelange commented Dec 23, 2023

Uh oh!

ddelange commented Dec 23, 2023

Uh oh!

Innixma commented Jan 21, 2024

Uh oh!

ddelange commented Mar 19, 2024

Uh oh!

AnirudhDagar commented Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

shchur Nov 7, 2023 •

edited

Loading

AnirudhDagar commented Mar 19, 2024 •

edited

Loading