-
Notifications
You must be signed in to change notification settings - Fork 648
Expose memmap dtype in data config #594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the change! Just left some small comments.
olmo/data/__init__.py
Outdated
@@ -36,6 +36,7 @@ def build_memmap_dataset( | |||
return MemMapDataset( | |||
*paths, | |||
chunk_size=train_config.model.max_sequence_length, | |||
memmap_dtype=train_config.data.effective_memmap_dtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is also used to setup the memmaps for evaluation. In that case, the data_config
is not the same as train_config.data
. We should respect the setting in data_config
setting.
memmap_dtype=train_config.data.effective_memmap_dtype, | |
memmap_dtype=data_config.effective_memmap_dtype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
olmo/config.py
Outdated
@@ -555,6 +556,7 @@ class InstanceFilterConfig(BaseConfig): | |||
@dataclass | |||
class DataConfig(BaseConfig): | |||
paths: Optional[List[str]] = None | |||
memmap_dtype: Optional[str] = "uint16" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make this non-optional? I don't think None
is useful here.
memmap_dtype: Optional[str] = "uint16" | |
memmap_dtype: str = "uint16" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
@2015aroras Thanks for the review. Updated the PR to address the comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are style issue due to imports. Make sure to follow the steps here so that required automatic checks pass.
@2015aroras Went though instructions and added all the missing things. Can I get another approval so that it kicks off all the auto checks? |
This is expose dtype in the data config so that we can support reading memmap files with different dtypes