DDP training tries to save sharded checkpoint on the last step

### 🐛 Describe the bug

```
2024-07-17T04:30:51.202069810Z 2024-07-16 21:30:51.201	jupiter-cs-aus-121.reviz.ai2.in:0	olmo.train:1268	INFO	Saving final checkpoint...
2024-07-17T04:30:52.220928528Z 2024-07-16 21:30:52.219	jupiter-cs-aus-121.reviz.ai2.in:5	olmo.util:163	CRITICAL	Uncaught AssertionError: TorchLegacyShardedCheckpointer is being called to save a model where `distributed_strategy` is not FSDP.
```

With DDP, when last step count is divisible by ``save_interval_unsharded``, this checkpoint has already been saved and hence the condition to save the last checkpoint for DDP goes to sharded checkpoint saver ``TorchLegacyShardedCheckpointer`` because of the following if condition: https://github.com/allenai/OLMo/blob/main/olmo/train.py#L1256

### Versions

NA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDP training tries to save sharded checkpoint on the last step #664

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DDP training tries to save sharded checkpoint on the last step #664

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions