Skip to content
This repository was archived by the owner on Jul 30, 2025. It is now read-only.
This repository was archived by the owner on Jul 30, 2025. It is now read-only.

Getting gibberish output when running on llama.cpp #24

@luungoc2005

Description

@luungoc2005

Hi, I see the mention of running this model on llama.cpp in README. Did you get a manage to get it to run and quantize with good output? I'm trying to evaluate if this model can be used for speculative decoding for llama 2 7B

With the first checkpoint https://huggingface.co/PY007/TinyLlama-1.1B-step-50K-105b - seems like there might be some issue converting to gguf

python convert.py ../TinyLlama-1.1B-step-50K-105b/

./main -m ../TinyLlama-1.1B-step-50K-105b/ggml-model-f32.gguf -p "Building a website can be done in 10 simple steps:\nStep 1:" -n 400 -ngl 0 --temp 0

Is resulting in the following - Either f16 or f32 would result in this, adding a <s> token at the beginning didn't help either:

(...)
Building a website can be done in 10 simple steps:\nStep 1:12000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
(...)

I can see that running with huggingface/torch is giving a more reasonable result, although it quickly becomes repeated

<s> Building a website can be done in 10 simple steps:
Step 1: Create a website.
Step 2: Add a logo.
Step 3: Add a contact form.
Step 4: Add a blog.
Step 5: Add a social media links.
Step 6: Add a contact page.
Step 7: Add a contact form.
Step 8: Add a contact form.
Step 9: Add a contact form.

Not sure where this mismatch is coming from

Thanks

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions