You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 30, 2025. It is now read-only.
Hi, I see the mention of running this model on llama.cpp in README. Did you get a manage to get it to run and quantize with good output? I'm trying to evaluate if this model can be used for speculative decoding for llama 2 7B
python convert.py ../TinyLlama-1.1B-step-50K-105b/
./main -m ../TinyLlama-1.1B-step-50K-105b/ggml-model-f32.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -ngl 0 --temp 0
Is resulting in the following - Either f16 or f32 would result in this, adding a <s> token at the beginning didn't help either:
(...)
Building a website can be done in 10 simple steps:\nStep 1:12000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
(...)
I can see that running with huggingface/torch is giving a more reasonable result, although it quickly becomes repeated
<s> Building a website can be done in 10 simple steps:
Step 1: Create a website.
Step 2: Add a logo.
Step 3: Add a contact form.
Step 4: Add a blog.
Step 5: Add a social media links.
Step 6: Add a contact page.
Step 7: Add a contact form.
Step 8: Add a contact form.
Step 9: Add a contact form.