NaN tensor values problem for GTX16xx users (no problem on other devices)

### Search before asking

- [X] I have searched the YOLOv5 [issues](https://github.com/ultralytics/yolov5/issues) and found no similar bug report.


### YOLOv5 Component

Training, Validation

### Bug

I used [yolov5 ](https://github.com/ultralytics/yolov5)to test with the demo dataset (coco128) and found that box and obj are nan. Also, there are no detections appear on validation images. This only happens on GTX1660ti devices (GPU mode), when I use CPU or use Google colab(Tesla K80) / RTX2070  for training, everything works fine.
![image](https://user-images.githubusercontent.com/17290550/169525214-44426c6b-11a0-4131-83ae-5b2869c19ba5.png)

### Environment

- Windows 10 10.0.19044.1706
- YOLOv5-6.1 (version 6.1)
- Nvidia GTX 1660 TI, 6 GB
- Python3.9
- cudatoolkit-11.3.1
- pytorch-1.11.0-py3.9_cuda11.3_cudnn8_0
- (also tried pytorch-1.11.0-py3.9_cuda11.5_cudnn8_0)
- (with dependencies installed correctly)

### Minimal Reproducible Example

The command used for training is
`python train.py`

### Additional

There are issues here also discussing the same problem.

- https://github.com/pytorch/pytorch/issues/58123
- https://github.com/openai/glide-text2im/issues/31
- https://discuss.pytorch.org/t/half-precision-convolution-cause-nan-in-forward-pass/117358/3
- https://github.com/pytorch/pytorch/issues/69449
- https://github.com/ultralytics/yolov5/issues/5815

However, I have tried pytorch with cuda version 11.5 (whose cudnn version is 8.3.0>8.2.2) and I also tried **downloading cuDNN from nvidia and copy/paste the dll files into the relevant folder in torch/lib** , the problem still can not be solved.

Another workaround is to downgrade to pytorch with cuda version 10.2(tested and it works), but this is currently not feasible as CUDA-10.2 PyTorch builds are no longer available for Windows.

### Are you willing to submit a PR?

- [ ] Yes I'd like to help by submitting a PR!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

NaN tensor values problem for GTX16xx users (no problem on other devices) #7908

Search before asking

YOLOv5 Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

NaN tensor values problem for GTX16xx users (no problem on other devices) #7908

Description

Search before asking

YOLOv5 Component

Bug

Environment

Minimal Reproducible Example

Additional

Are you willing to submit a PR?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions