Skip to content

Releases: withcatai/node-llama-cpp

v3.12.4

28 Aug 00:40
c5cd057
Compare
Choose a tag to compare

gpt-oss is here!

Read about the release in the blog post


3.12.4 (2025-08-28)

Bug Fixes


Shipped with llama.cpp release b6301

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.12.3

26 Aug 23:01
6e59160
Compare
Choose a tag to compare

gpt-oss is here!

Read about the release in the blog post


3.12.3 (2025-08-26)

Bug Fixes

  • Vulkan: context creation edge cases (#492) (12749c0)
  • prebuilt binaries CUDA 13 support (#494) (b10999d)
  • don't share loaded shared libraries between backends (#492) (12749c0)
  • split prebuilt CUDA binaries into 2 npm modules (#495) (6e59160)

Shipped with llama.cpp release b6294

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.12.1

11 Aug 18:38
f849cd9
Compare
Choose a tag to compare

gpt-oss is here!

Read about the release in the blog post


3.12.1 (2025-08-11)

Features

Bug Fixes

  • gpt-oss segment budgets (#489) (30eaa23)
  • add support for more gpt-oss variations (#489) (30eaa23)
  • default to using a model message for prompt completion on unsupported models (#489) (30eaa23)
  • prompt completion config (#490) (f849cd9)

Shipped with llama.cpp release b6133

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.12.0

09 Aug 19:14
722e29d
Compare
Choose a tag to compare

gpt-oss is here!

Read about the release in the blog post


3.12.0 (2025-08-09)

Features

Bug Fixes


Shipped with llama.cpp release b6122

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.11.0

29 Jul 10:55
5565614
Compare
Choose a tag to compare

3.11.0 (2025-07-29)

Features

Bug Fixes


Shipped with llama.cpp release b6018

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.10.0

12 Jun 01:14
59cf309
Compare
Choose a tag to compare

3.10.0 (2025-06-12)

Features

  • JSON Schema Grammar: $defs and $ref support with full inferred types (#472) (9cdbce9)
  • inspect gguf command: format and print the Jinja chat template with --key .chatTemplate (#472) (9cdbce9)

Bug Fixes

  • JinjaTemplateChatWrapper: first function call prefix detection (#472) (9cdbce9)
  • QwenChatWrapper: improve Qwen chat template detection (#472) (9cdbce9)
  • apply maxTokens on function calling parameters (#472) (9cdbce9)
  • adjust default prompt completion length based on SWA size when relevant (#472) (9cdbce9)
  • improve thought segmentation syntax extraction (#472) (9cdbce9)
  • adapt to llama.cpp changes (#472) (9cdbce9)

Shipped with llama.cpp release b5640

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.9.0

04 Jun 23:26
ea8d904
Compare
Choose a tag to compare

3.9.0 (2025-06-04)

Features

Bug Fixes


Shipped with llama.cpp release b5590

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.8.1

19 May 20:53
1799127
Compare
Choose a tag to compare

3.8.1 (2025-05-19)

Bug Fixes

  • getLlamaGpuTypes: edge case (#463) (1799127)
  • remove prompt completion from the cached context window (#463) (1799127)

Shipped with llama.cpp release b5415

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.8.0

17 May 22:11
f2cb873
Compare
Choose a tag to compare

3.8.0 (2025-05-17)

Features

Bug Fixes

  • adapt to breaking llama.cpp changes (#460) (f2cb873)
  • capture multi-token segment separators (#460) (f2cb873)
  • race condition when reading extremely long gguf metadata (#460) (f2cb873)
  • adapt memory estimation to newly added model architectures (#460) (f2cb873)
  • skip binary testing on certain problematic conditions (#460) (f2cb873)
  • improve GPU backend loading error description (#460) (f2cb873)

Shipped with llama.cpp release b5414

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)

v3.7.0

28 Mar 01:07
c070e81
Compare
Choose a tag to compare

3.7.0 (2025-03-28)

Features

  • extract function calling syntax from a Jinja template (#444) (c070e81)
  • Full support for Qwen and QwQ via QwenChatWrapper (#444) (c070e81)
  • export a llama instance getter on a model instance (#444) (c070e81)

Bug Fixes

  • better handling for function calling with empty parameters (#444) (c070e81)
  • reranking edge case crash (#444) (c070e81)
  • limit the context size by default in the node-typescript template (#444) (c070e81)
  • adapt to breaking llama.cpp changes (#444) (c070e81)
  • bump min nodejs version to 20 due to dependencies' requirements (#444) (c070e81)
  • defineChatSessionFunction type (#444) (c070e81)

Shipped with llama.cpp release b4980

To use the latest llama.cpp release available, run npx -n node-llama-cpp source download --release latest. (learn more)