-
Notifications
You must be signed in to change notification settings - Fork 441
Make ilab
able to download safetensors
#1033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
349e033
to
b7140b1
Compare
b7140b1
to
19a6b3e
Compare
when downloading from HF, check check if the repo has .safetensor files. If it does, and `--filename` is default, download the entire repo and put it into a <repo_name> folder inside of models Signed-off-by: Charlie Doern <cdoern@redhat.com>
19a6b3e
to
6af9db5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! I know @alimaredia is testing this, so I requested his review. It should block merging until he approves as well.
@@ -712,15 +712,36 @@ def chat( | |||
def download(ctx, repository, release, filename, model_dir): | |||
"""Download the model(s) to train""" | |||
click.echo(f"Downloading model from {repository}@{release} to {model_dir}...") | |||
if ( | |||
"HF_TOKEN" not in os.environ | |||
and repository != "instructlab/merlinite-7b-lab-GGUF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this check? I think it might be best not to hardcode merlinite for such checks - it would also be good practice to always have an HF token exposed and would make the code easier and cleaner to maintain (even if technically you dont need a token for the instructlab org)
just my 0.02
especially considering this would fail if we try to pull the granite model even though it exists in the same instructlab HF org
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this was so at least the default config didn't require it, which is nice.
An even better UX would determine whether the token is required for a given model before then erroring out because it's not set, but that could be a future improvement ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right - then for now should we at least only check if "instructlab" is part of the repo string, and we allow the download without token if true? since we would want to at least have uniform behavior for all the models in the instructlab
HF org
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about you use clicks' feature to read values from env vars?
@click.option('--hf-token', envvar='HF_TOKEN')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merl is the default. This is my way of checking If NO HF TOKEN and THE USER SET REPO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was surprised by it asking me for a HF_TOKEN when I did ilab download --repository instructlab/merlinite-7b-lab
as that repository does not need one. In my case, I specifically wanted it to download the non-gguf version of this model, but perhaps the logic around HF_TOKEN usage should be a bit broader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was improved further in #1035
filename=filename, | ||
local_dir=model_dir, | ||
) | ||
files = list_repo_files(repo_id=repository, token=os.getenv("HF_TOKEN")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cdoern why not download the entire repository via snapshot_download()
instead of going file by file? https://huggingface.co/docs/huggingface_hub/en/guides/download#download-an-entire-repository
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good idea to me +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies @alimaredia -- once you posted this review, that made mergify happy that you looked at it. If you had done "request changes" it would have blocked it, just FYI.
I probably should just set a hold
label and let you remove it
uh oh, seems mergify has gone rogue!!!! I will open another PR to address folks' comments |
when downloading from HF, check check if the repo has .safetensor files. If it does, and
--filename
is default, download the entire repo and put it into a <repo_name> folder inside of modelsSigned-off-by: Charlie Doern cdoern@redhat.com