Skip to content

SFT w/ FSDP2 #197

@yfw

Description

@yfw
  • Compare FSDP2 and FSDP1
  • w/ TP > 1
  • sequence parallel
  • activation checkpointing
  • cpu offload
  • target context length: 32k, llama3-8b

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions