-
Notifications
You must be signed in to change notification settings - Fork 6.3k
Closed
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates
Description
Describe the bug
train_dreambooth_lora_sdxl.py
can't be resumed from a checkpoint using fp16. The log error is Attempting to unscale FP16 gradients.
This is a big blocker from being able to train on the free colab tier since you need fp16 to fit in vram, but also need to resume from checkpoints since it can hit a timeout at any moment.
Reproduction
Reproduce with: https://colab.research.google.com/drive/15woNcXcpsa3GDGk6cmDtIL2V8zRtOOj3
Logs
No response
System Info
latest diffusers, system is whatever is on colab (see linked colab above)
Who can help?
danieltanhx and SunMarc
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleIssues that haven't received updatesIssues that haven't received updates