-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Description
- I tried vanilla pytorch training loop using bfloat16, the loss got overflow, https://github.com/mesolitica/malaya/blob/5.1/pretrained-model/mamba/causallm-130m-bf16.ipynb
- so I tried vanilla pytorch training loop using fp32, the loss is ok, https://github.com/mesolitica/malaya/blob/5.1/pretrained-model/mamba/causallm-130m-fp32.ipynb
- I thought maybe because no gradient clipping and etc, so I tried using HuggingFace trainer with Deepspeed, https://github.com/mesolitica/malaya/blob/5.1/pretrained-model/mamba/causallm-130m-trainer-deepspeed-bf16.ipynb, the loss got overflow.
- so I removed deepspeed and use fp32 in HuggingFace trainer, https://github.com/mesolitica/malaya/blob/5.1/pretrained-model/mamba/causallm-130m-trainer-fp32.ipynb, the loss is ok.
If bfloat16 is not working, deepspeed is not going to work.
Metadata
Metadata
Assignees
Labels
No labels