Releases: intentee/paddler
v2.1.0
License switch
The license was switched from MIT to Apache-2.0, which is more permissive (has explicit patent grants), so it should be easier to adopt in organizations.
Features
- OpenAI compatibility endpoint:
- Support for
max_completion_tokens
parameter in /v1/chat/completions endpoint - Support for
messages
parameter in /v1/chat/completions endpoint - Support for
stream
parameter in /v1/chat/completions endpoint
- Support for
Documentation: https://paddler.intentee.com/docs/migrating-to-paddler/openai-compatibility/
Full Changelog: v2.0.0...v2.1.0
v2.0.0
What's Changed
Long story short, we rewrote most of the llama-server
, made it scalable, and bundled that with Paddler. This means you do not have to deploy llama-server
alongside Paddler anymore (the fewer moving parts, the better). :)
We also have a new, vastly improved admin panel, and a lot of other fixes, improvements, and changes.
Full Changelog: v1.2.1-rc1...v2.0.0
v2.0.0-rc1
What's Changed
Long story short, we rewrote most of the llama-server
, made it scalable, and bundled that with Paddler. This means you do not have to deploy llama-server
alongside Paddler anymore (the fewer moving parts, the better). :)
We also have a new, vastly improved admin panel, and a lot of other fixes, improvements, and changes.
Full Changelog: v1.2.1-rc1...v2.0.0-rc1
v1.2.1-rc1
Fixes
- Fix overflow/underflow issues when managing slots
v1.2.0
Features
- Add TUI dashboard (
paddler dashboard --management-addr [HOST]:[PORT]
) to be able to easily observe balancer instances from the terminal level (thank you @Propfend for the contribution!)
v1.1.0
- More meaningful error messages when the agent can't connect to the llama.cpp slot endpoint, or when slot endpoint is not enabled in llama.cpp
- Set default logging level to
info
for agents and balancer to increase the amount of information in the logs (it wasn't clean if the agent was running or not) - Enable LTO optimization for the release builds (see #28) (thank you @zamazan4ik)
v1.0.0
Paddler is now rewritten in Rust and uses the Pingora framework for the networking stack. A few minor API changes and reporting improvements are introduced (documented in the README).
This is a stability/quality release that solves some memory related issues, and makes the balancer more resilient to agents randomly going up/down.
v1.0.0-rc1
chore(gh-actions): consistent asset names
v0.10.0
chore: update README