Skip to content

Better handling of examples that exceed max seq len #169

@ashors1

Description

@ashors1

Currently, we mask out examples whose sequence length exceeds the max. But this can have some unintended side effects, at least for SFT (e.g. if running with MBS 1, the entire microbatch will end up with 0 loss, which skews the average loss for the batch). We should think about whether there's a better way to handle this. One option is truncating the long examples rather than masking.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions