Skip to content

Releases: taketwo/llm-ollama

0.13.0

28 Jul 10:45
0a750b3
Compare
Choose a tag to compare
  • Add an option to disable thinking (for models that have thinking capability).
    Example usage: llm -m qwen3:30b-a3b-q4_K_M "Why is the sky blue?" -o think false
  • Set the connection timeout to 1 second to quickly fail when the Ollama host is unreachable.

0.12.0

05 Jul 20:55
73dfa76
Compare
Choose a tag to compare
  • Switch to using the official Ollama API to query model capabilities instead of ad hoc heuristics.
    Warning: this API was added in Ollama version 0.6.4.
  • Add/remove options to align with current Ollama modelfile parameters.
  • Rename list-models plugin subcommand to models and include capabilities in the output.
    Example:
$ llm ollama models
model                             digest          capabilities
qwen3:4b                          2bfd38a7daaf    completion, tools, thinking
snowflake-arctic-embed2:latest    5de93a84837d    embedding
gemma3:1b                         2d27a774bc62    completion

0.11.0

29 May 08:09
Compare
Choose a tag to compare

0.11a0

14 May 05:44
df288b8
Compare
Choose a tag to compare
0.11a0 Pre-release
Pre-release

0.10.0

06 May 16:42
4c91d45
Compare
Choose a tag to compare
  • Add support for Basic Authentication when connecting to Ollama server.
    Example usage: export OLLAMA_HOST=https://username:password@192.168.1.13:11434
  • Add caching of model capability detection results. This prevents calling Ollama's /api/show endpoint for each model on each llm invocation.

0.9.1

10 Mar 02:24
720ad00
Compare
Choose a tag to compare
  • Fix error when using --extract option.

0.9.0

02 Mar 15:52
5d1c574
Compare
Choose a tag to compare
  • Add support for JSON schemas.
    Example usage: llm -m llama3.2 --schema "name, age int, one_sentence_bio" "invent a cool dog"

0.8.2

22 Jan 11:05
4baaff6
Compare
Choose a tag to compare
  • Fix primary model name selection logic to prefer names with longer tags.
  • Propagate input/output token usage information to llm.
    To see token usage, specify -u option, e.g.:
    $ llm -u -m llama3.2 "How much is 2+2?"
    The answer to 2 + 2 is 4.
    Token usage: 33 input, 13 output
    

0.8.1

20 Dec 12:20
39f73a3
Compare
Choose a tag to compare
  • Fix bug in building messages in conversation when async models are used

0.8.0

11 Dec 12:33
dd616e7
Compare
Choose a tag to compare
  • Add support for async LLM models