Skip to content

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Feb 18, 2025

Issue #, if available:

Description of changes:

Improvement to first check if cuda is available before trying to get GPU count. On a CPU machine, trying to get GPU count when cuda isn't available leads to warning messages. Better to avoid the warning messages.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added the code cleanup Fixing warnings/deprecations/syntax/etc. label Feb 18, 2025
Copy link

Job PR-4902-f7c55c6 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4902/f7c55c6/index.html

@Innixma
Copy link
Contributor Author

Innixma commented Feb 18, 2025

Example logs on mainline showing the warning on a CPU instance:

Fitting 3 L1 models, fit_strategy="sequential" ...
Fitting model: LightGBM ... Training model for up to 59.97s of the 59.97s of remaining time.
/opt/conda/envs/ag-311/lib/python3.11/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
	-0.3853	 = Validation score   (-log_loss)
	2.64s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting model: RandomForest ... Training model for up to 57.32s of the 57.32s of remaining time.
/opt/conda/envs/ag-311/lib/python3.11/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
	-0.3725	 = Validation score   (-log_loss)
	0.69s	 = Training   runtime
	0.05s	 = Validation runtime
Fitting model: NeuralNetTorch ... Training model for up to 56.54s of the 56.54s of remaining time.
/opt/conda/envs/ag-311/lib/python3.11/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
	-0.3739	 = Validation score   (-log_loss)
	3.3s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: WeightedEnsemble_L2 ... Training model for up to 59.97s of the 53.22s of remaining time.
/opt/conda/envs/ag-311/lib/python3.11/site-packages/torch/cuda/__init__.py:654: UserWarning: Can't initialize NVML
  warnings.warn("Can't initialize NVML")
	Ensemble Weights: {'RandomForest': 0.48, 'NeuralNetTorch': 0.48, 'LightGBM': 0.04}
	-0.3536	 = Validation score   (-log_loss)
	0.01s	 = Training   runtime
	0.0s	 = Validation runtime

Copy link
Contributor

@suzhoum suzhoum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Innixma Innixma merged commit fd72bcc into autogluon:master Feb 18, 2025
27 checks passed
@Innixma Innixma deleted the fix_torch_gpu_check branch April 16, 2025 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code cleanup Fixing warnings/deprecations/syntax/etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants