Skip to content

kojix2/llama.cr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llama.cr

test examples docs Lines of Code

Crystal bindings for llama.cpp, a C/C++ implementation of LLaMA, Falcon, GPT-2, and other large language models.

Please check the LLAMA_VERSION file for the current compatible version of llama.cpp.

This project is under active development and may change rapidly.

Features

  • Low-level bindings to the llama.cpp C API
  • High-level Crystal wrapper classes for easy usage
  • Memory management for C resources
  • Simple text generation interface
  • Advanced sampling methods (Min-P, Typical, Mirostat, etc.)
  • Batch processing for efficient token handling
  • KV cache management for optimized inference
  • State saving and loading

Installation

Prerequisites

You need the llama.cpp shared library (libllama) available on your system.

1. Download Prebuilt Binary (Recommended)

LLAMA_VERSION=$(cat LLAMA_VERSION)
curl -L "https://github.com/ggml-org/llama.cpp/releases/download/${LLAMA_VERSION}/llama-${LLAMA_VERSION}-bin-ubuntu-x64.zip" -o llama.zip
unzip llama.zip
sudo cp build/bin/*.so /usr/local/lib/
sudo ldconfig

For macOS, replace ubuntu-x64 with macos-arm64 and *.so with *.dylib.

Alternative: Using LLAMA_CPP_DIR

If you prefer not to install system-wide, you can set the LLAMA_CPP_DIR environment variable:

export LLAMA_CPP_DIR=/path/to/llama.cpp
crystal build examples/simple.cr
LLAMA_CPP_DIR=/path/to/llama.cpp ./simple_example --model models/tiny_model.gguf
Build from source (advanced users)
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
git checkout $(cat ../LLAMA_VERSION)
mkdir build && cd build
cmake .. && cmake --build . --config Release
sudo cmake --install . && sudo ldconfig

Obtaining GGUF Model Files

You'll need a model file in GGUF format. For testing, smaller quantized models (1-3B parameters) with Q4_K_M quantization are recommended.

Popular options:

Adding to Your Project

Add the dependency to your shard.yml:

dependencies:
  llama:
    github: kojix2/llama.cr

Then run shards install.

Usage

Basic Text Generation

require "llama"

# Load a model
model = Llama::Model.new("/path/to/model.gguf")

# Create a context
context = model.context

# Generate text
response = context.generate("Once upon a time", max_tokens: 100, temperature: 0.8)
puts response

# Or use the convenience method
response = Llama.generate("/path/to/model.gguf", "Once upon a time")
puts response

Advanced Sampling

require "llama"

model = Llama::Model.new("/path/to/model.gguf")
context = model.context

# Create a sampler chain with multiple sampling methods
chain = Llama::SamplerChain.new
chain.add(Llama::Sampler::TopK.new(40))
chain.add(Llama::Sampler::MinP.new(0.05, 1))
chain.add(Llama::Sampler::Temp.new(0.8))
chain.add(Llama::Sampler::Dist.new(42))

# Generate text with the custom sampler chain
result = context.generate_with_sampler("Write a short poem about AI:", chain, 150)
puts result

Chat Conversations

require "llama"
require "llama/chat"

model = Llama::Model.new("/path/to/model.gguf")
context = model.context

# Create a chat conversation
messages = [
  Llama::ChatMessage.new("system", "You are a helpful assistant."),
  Llama::ChatMessage.new("user", "Hello, who are you?")
]

# Generate a response
response = context.chat(messages)
puts "Assistant: #{response}"

# Continue the conversation
messages << Llama::ChatMessage.new("assistant", response)
messages << Llama::ChatMessage.new("user", "Tell me a joke")
response = context.chat(messages)
puts "Assistant: #{response}"

Embeddings

require "llama"

model = Llama::Model.new("/path/to/model.gguf")

# Create a context with embeddings enabled
context = model.context(embeddings: true)

# Get embeddings for text
text = "Hello, world!"
tokens = model.vocab.tokenize(text)
batch = Llama::Batch.get_one(tokens)
context.decode(batch)
embeddings = context.get_embeddings_seq(0)

puts "Embedding dimension: #{embeddings.size}"

Utilities

System Info

puts Llama.system_info

Tokenization Utility

model = Llama::Model.new("/path/to/model.gguf")
puts Llama.tokenize_and_format(model.vocab, "Hello, world!", ids_only: true)

Examples

The examples directory contains sample code demonstrating various features:

  • simple.cr - Basic text generation
  • chat.cr - Chat conversations with models
  • tokenize.cr - Tokenization and vocabulary features

API Documentation

See kojix2.github.io/llama.cr for full API docs.

Core Classes

  • Llama::Model - Represents a loaded LLaMA model
  • Llama::Context - Handles inference state for a model
  • Llama::Vocab - Provides access to the model's vocabulary
  • Llama::Batch - Manages batches of tokens for efficient processing
  • Llama::KvCache - Controls the key-value cache for optimized inference
  • Llama::State - Handles saving and loading model state
  • Llama::SamplerChain - Combines multiple sampling methods

Samplers

  • Llama::Sampler::TopK - Keeps only the top K most likely tokens
  • Llama::Sampler::TopP - Nucleus sampling (keeps tokens until cumulative probability exceeds P)
  • Llama::Sampler::Temp - Applies temperature to logits
  • Llama::Sampler::Dist - Samples from the final probability distribution
  • Llama::Sampler::MinP - Keeps tokens with probability >= P * max_probability
  • Llama::Sampler::Typical - Selects tokens based on their "typicality" (entropy)
  • Llama::Sampler::Mirostat - Dynamically adjusts sampling to maintain target entropy
  • Llama::Sampler::Penalties - Applies penalties to reduce repetition

Development

See DEVELOPMENT.md for development guidelines.

Do you need commit rights?

  • If you need commit rights to my repository or want to get admin rights and take over the project, please feel free to contact @kojix2.
  • Many OSS projects become abandoned because only the founder has commit rights to the original repository.

Contributing

  1. Fork it (https://github.com/kojix2/llama.cr/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

License

This project is available under the MIT License. See the LICENSE file for more info.

About

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •