Skip to content

Conversation

dirkgr
Copy link
Member

@dirkgr dirkgr commented Mar 26, 2024

It is not fast like this, because we make a lot of small range requests per batch, but it works.

@dirkgr dirkgr requested review from epwalsh and 2015aroras March 26, 2024 18:09
Copy link
Collaborator

@2015aroras 2015aroras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It's nice to see OLMo even easier to use.

nit: Can you remove/update the "Once you've updated the data paths in the config..." part of the README?

@@ -93,59 +93,59 @@ evaluators:
drop_last: true
datasets:
v3-small-c4_en-validation:
- r2://olmo-data/eval-data/perplexity/v3_small_gptneox20b/c4_en/val/part-0-00000.npy
- http://olmo-data.org/eval-data/perplexity/v3_small_gptneox20b/c4_en/val/part-0-00000.npy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, any reason for http instead of https?

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, also just curious why HTTP and not HTTPS.

@dirkgr dirkgr merged commit 9a0a84a into main Apr 3, 2024
@dirkgr dirkgr deleted the PublicTrainingData branch April 3, 2024 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants