Skip to content

gemma-3 #236

@terrykong

Description

@terrykong

To support gemma-3 it looks like we need a few changes:

(FSDP1PolicyWorker[rank=0] pid=1810886) It is strongly recommended to train Gemma3 models with the eager attention implementation instead of sdpa. Use eager with AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager').

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions