-
Notifications
You must be signed in to change notification settings - Fork 120
Closed
Description
currently the user has to opt into fsdp2 path. but we want to support only the fsdp2 path and remove the fsdp1 path ideally.
the reason why it's hard is because fsdp2 path we have to apply it on modules which atm requires us to make assumptions on the model structure.
I think a brief sketch of the following will work:
for name, mod in self.named_module():
# find the largest module that is over the size of FSDP split user provides
fully_shard(the module from above)
Metadata
Metadata
Assignees
Labels
No labels