check cuda.is_available() before cuda.device_count() #4902

Innixma · 2025-02-18T04:56:13Z

Issue #, if available:

Description of changes:

Improvement to first check if cuda is available before trying to get GPU count. On a CPU machine, trying to get GPU count when cuda isn't available leads to warning messages. Better to avoid the warning messages.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2025-02-18T08:07:33Z

Job PR-4902-f7c55c6 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4902/f7c55c6/index.html

Innixma · 2025-02-18T22:04:22Z

Example logs on mainline showing the warning on a CPU instance:

Fitting 3 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM ... Training model for up to 59.97s of the 59.97s of remaining time.
/opt/conda/envs/ag-311/lib/python3.11/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
	-0.3853	 = Validation score   (-log_loss)
	2.64s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: RandomForest ... Training model for up to 57.32s of the 57.32s of remaining time.
/opt/conda/envs/ag-311/lib/python3.11/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
	-0.3725	 = Validation score   (-log_loss)
	0.69s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: NeuralNetTorch ... Training model for up to 56.54s of the 56.54s of remaining time.
/opt/conda/envs/ag-311/lib/python3.11/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
	-0.3739	 = Validation score   (-log_loss)
	3.3s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 59.97s of the 53.22s of remaining time.
/opt/conda/envs/ag-311/lib/python3.11/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
	Ensemble Weights: {'RandomForest': 0.48, 'NeuralNetTorch': 0.48, 'LightGBM': 0.04}
	-0.3536	 = Validation score   (-log_loss)
	0.01s	 = Training   runtime
	0.0s	 = Validation runtime

suzhoum

LGTM!

check cuda.is_available() before cuda.device_count()

43af469

Innixma added the code cleanup Fixing warnings/deprecations/syntax/etc. label Feb 18, 2025

Innixma requested a review from prateekdesai04 February 18, 2025 04:56

linting

f7c55c6

suzhoum approved these changes Feb 18, 2025

View reviewed changes

Innixma merged commit fd72bcc into autogluon:master Feb 18, 2025
27 checks passed

Innixma deleted the fix_torch_gpu_check branch April 16, 2025 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

check cuda.is_available() before cuda.device_count() #4902

check cuda.is_available() before cuda.device_count() #4902

Uh oh!

Innixma commented Feb 18, 2025

Uh oh!

github-actions bot commented Feb 18, 2025

Uh oh!

Innixma commented Feb 18, 2025

Uh oh!

suzhoum left a comment

Uh oh!

Uh oh!

Uh oh!

check cuda.is_available() before cuda.device_count() #4902

check cuda.is_available() before cuda.device_count() #4902

Uh oh!

Conversation

Innixma commented Feb 18, 2025

Uh oh!

github-actions bot commented Feb 18, 2025

Uh oh!

Innixma commented Feb 18, 2025

Uh oh!

suzhoum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!