✨ `gpt-oss` is here! ✨

Read about the release in the blog post

3.12.4 (2025-08-28)

Bug Fixes

gpt-oss prompt preloading (#496) (db4a243)

Shipped with llama.cpp release b6301

✨ `gpt-oss` is here! ✨

Read about the release in the blog post

3.12.3 (2025-08-26)

Bug Fixes

Vulkan: context creation edge cases (#492) (12749c0)
prebuilt binaries CUDA 13 support (#494) (b10999d)
don't share loaded shared libraries between backends (#492) (12749c0)
split prebuilt CUDA binaries into 2 npm modules (#495) (6e59160)

Shipped with llama.cpp release b6294

✨ `gpt-oss` is here! ✨

Read about the release in the blog post

3.12.1 (2025-08-11)

Features

comment segment budget (#489) (30eaa23) (documentation: API: LLamaChatPromptOptions["budgets"]["commentTokens"])
Electron template: comment segments
Electron template: improve completions speed when using functions

Bug Fixes

gpt-oss segment budgets (#489) (30eaa23)
add support for more gpt-oss variations (#489) (30eaa23)
default to using a model message for prompt completion on unsupported models (#489) (30eaa23)
prompt completion config (#490) (f849cd9)

Shipped with llama.cpp release b6133

✨ `gpt-oss` is here! ✨

Read about the release in the blog post

3.12.0 (2025-08-09)

Features

gpt-oss support (#487) (722e29d) (documentation: gpt-oss)

Bug Fixes

Llama: expose the numa (#485) (ea0d815)
add --numa flag to cli commands (#485) (ea0d815)

Shipped with llama.cpp release b6122

3.11.0 (2025-07-29)

Features

NUMA policy (#482) (a2ddaa2) (documentation: API: LlamaOptions["numa"])
inspect gpu command: log prebuilt binaries and cloned source releases (#482) (a2ddaa2)

Bug Fixes

add missing GGUF metadata types (#482) (a2ddaa2)
level of some internal logs (#482) (a2ddaa2)
JSON schema grammar edge case (#482) (a2ddaa2)

Shipped with llama.cpp release b6018

3.10.0 (2025-06-12)

Features

JSON Schema Grammar: $defs and $ref support with full inferred types (#472) (9cdbce9)
inspect gguf command: format and print the Jinja chat template with --key .chatTemplate (#472) (9cdbce9)

Bug Fixes

JinjaTemplateChatWrapper: first function call prefix detection (#472) (9cdbce9)
QwenChatWrapper: improve Qwen chat template detection (#472) (9cdbce9)
apply maxTokens on function calling parameters (#472) (9cdbce9)
adjust default prompt completion length based on SWA size when relevant (#472) (9cdbce9)
improve thought segmentation syntax extraction (#472) (9cdbce9)
adapt to llama.cpp changes (#472) (9cdbce9)

Shipped with llama.cpp release b5640

3.9.0 (2025-06-04)

Features

reasoning budget (#468) (ea8d904) (documentation: Set Reasoning Budget)
SWA (Sliding Window Attention) support - greatly reduced context memory consumption on supported models (#468) (ea8d904)
documentation: LLMs friendly llms.md and llms-full.md files (#468) (ea8d904)

Bug Fixes

prompt completion edge cases (#468) (ea8d904)
adapt to llama.cpp changes (#468) (ea8d904)

Shipped with llama.cpp release b5590

3.8.1 (2025-05-19)

Bug Fixes

getLlamaGpuTypes: edge case (#463) (1799127)
remove prompt completion from the cached context window (#463) (1799127)

Shipped with llama.cpp release b5415

3.8.0 (2025-05-17)

Features

save and restore a context sequence state (#460) (f2cb873) (documentation: Saving and restoring a context sequence evaluation state)
stream function call parameters (#460) (f2cb873) (documentation: API: LLamaChatPromptOptions["onFunctionCallParamsChunk"])
configure Hugging Face remote endpoint for resolving URIs (#460) (f2cb873) (documentation: API: ResolveModelFileOptions["endpoints"])
Qwen 3 support (#460) (f2cb873)
QwenChatWrapper: support discouraging the generation of thoughts (#460) (f2cb873) (documentation: API: QwenChatWrapper constructor > thoughts option)
getLlama: dryRun option (#460) (f2cb873) (documentation: API: LlamaOptions["dryRun"])
getLlamaGpuTypes function (#460) (f2cb873) (documentation: API: getLlamaGpuTypes)

Bug Fixes

adapt to breaking llama.cpp changes (#460) (f2cb873)
capture multi-token segment separators (#460) (f2cb873)
race condition when reading extremely long gguf metadata (#460) (f2cb873)
adapt memory estimation to newly added model architectures (#460) (f2cb873)
skip binary testing on certain problematic conditions (#460) (f2cb873)
improve GPU backend loading error description (#460) (f2cb873)

Shipped with llama.cpp release b5414

3.7.0 (2025-03-28)

Features

extract function calling syntax from a Jinja template (#444) (c070e81)
Full support for Qwen and QwQ via QwenChatWrapper (#444) (c070e81)
export a llama instance getter on a model instance (#444) (c070e81)

Bug Fixes

better handling for function calling with empty parameters (#444) (c070e81)
reranking edge case crash (#444) (c070e81)
limit the context size by default in the node-typescript template (#444) (c070e81)
adapt to breaking llama.cpp changes (#444) (c070e81)
bump min nodejs version to 20 due to dependencies' requirements (#444) (c070e81)
defineChatSessionFunction type (#444) (c070e81)

Shipped with llama.cpp release b4980

Uh oh!

Releases: withcatai/node-llama-cpp

v3.12.4

✨ gpt-oss is here! ✨

3.12.4 (2025-08-28)

Bug Fixes

Uh oh!

v3.12.3

✨ gpt-oss is here! ✨

3.12.3 (2025-08-26)

Bug Fixes

Uh oh!

v3.12.1

✨ gpt-oss is here! ✨

3.12.1 (2025-08-11)

Features

Bug Fixes

Uh oh!

v3.12.0

✨ gpt-oss is here! ✨

3.12.0 (2025-08-09)

Features

Bug Fixes

Uh oh!

v3.11.0

3.11.0 (2025-07-29)

Features

Bug Fixes

Uh oh!

v3.10.0

3.10.0 (2025-06-12)

Features

Bug Fixes

Uh oh!

v3.9.0

3.9.0 (2025-06-04)

Features

Bug Fixes

Uh oh!

v3.8.1

3.8.1 (2025-05-19)

Bug Fixes

Uh oh!

v3.8.0

3.8.0 (2025-05-17)

Features

Bug Fixes

Uh oh!

v3.7.0

3.7.0 (2025-03-28)

Features

Bug Fixes

Uh oh!

✨ `gpt-oss` is here! ✨

✨ `gpt-oss` is here! ✨

✨ `gpt-oss` is here! ✨

✨ `gpt-oss` is here! ✨