Skip to content

Conversation

epwalsh
Copy link
Member

@epwalsh epwalsh commented Jul 16, 2024

Adds support for document masking during training via flash-attn.
This is activated when the flag --data.generate_doc_lengths is set.
The code changes were adapted from https://github.com/yuzhaouoe/pretraining-data-packing.

Copy link
Member

@dirkgr dirkgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a good change, we definitely need to use flash attention on LUMI, which is its own can of worms ...

@epwalsh epwalsh merged commit 4e00460 into main Jul 19, 2024
@epwalsh epwalsh deleted the epwalsh/document-masking branch July 19, 2024 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants