-
Notifications
You must be signed in to change notification settings - Fork 651
Make R2 intermediate checkpoints of official runs easy to access #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! Exactly what I thought we're missing.
Can you add a quick description somewhere that explains what exactly people will find in those directories, i.e., model.pt
, optim.pt
, and whatever else? Unfortunately those R2 URLs cannot be listed, so we have to describe it elsewhere. Or can we upload an index.html file to all of those directories?
Can we use http:// URLs in the OLMo code to load checkpoints? I thought this would not work if you can't list the directory, but now that I think about it, maybe it works?
I'll check the http scenario. Regarding the description of directory contents, I did add a small bit to the README regarding the 4 files in a checkpoint dir. Did you have anything else in mind? |
No, that's all! Thanks! |
Running the OLMo-1B config with the load path set to a https URL worked fine (on a somewhat outdated branch). I've updated the README with a bit more details about resuming from a checkpoint. |
Hi @epwalsh, it seems the urls cannot be accessed now. For example, when I tried to download from the last link in OLMo-7B.csv, I got "error 404". Are the files expired or set as private now? Thank you! |
The checkpoint directory cannot be directly accessed, but files within the directory can be. This is discussed briefly in the README |
Oh I see. Thank you so much! |
We have all the intermediate checkpoints of our official runs in R2, but nobody can access them since they don't know the correct URLs. This PR adds csv files that map run steps to the corresponding 'directory' URLs. A user can download the relevant checkpoint files from these URLs.
I was considering using Cloudflare redirects as an alternative solution. Specifically, we would have nice URLs like
https://olmo-checkpoints.org/OLMo-7B/step5000/config.yaml
. This solution requires the user to know what step numbers are valid, so something like the csv files I added would be necessary anyways. We can add redirects or other improvements later. For now, we should make the checkpoints available in some form.