-
Notifications
You must be signed in to change notification settings - Fork 127
Labels
bugSomething isn't workingSomething isn't working
Description
To support gemma-3 it looks like we need a few changes:
- bump transformers to latest (feat: streaming each dtensor in refit #176 bumps from 4.49.0 -> 4.51.3, which should be enough)
- use eager attention / make it configurable (related)
(FSDP1PolicyWorker[rank=0] pid=1810886) It is strongly recommended to train Gemma3 models with the eager
attention implementation instead of sdpa
. Use eager
with AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')
.
Metadata
Metadata
Labels
bugSomething isn't workingSomething isn't working