-
Notifications
You must be signed in to change notification settings - Fork 13k
Comparing changes
Open a pull request
base repository: ollama/ollama
base: v0.6.3
head repository: ollama/ollama
compare: v0.6.4
- 18 commits
- 56 files changed
- 12 contributors
Commits on Mar 27, 2025
-
Configuration menu - View commit details
-
Copy full SHA for b816ff8 - Browse repository at this point
Copy the full SHA b816ff8View commit details -
Configuration menu - View commit details
-
Copy full SHA for ead27aa - Browse repository at this point
Copy the full SHA ead27aaView commit details -
ml: Remove Output from Context interface
Model implementations should use Input for all of their tensors supplied to the model. This includes tensors that relate to the outputs, which is confusing since there is also an Output funciton. Since Output is only used internally in GGML and not used by any model implementations, we can remove it from the interface to reduce confusion.
Configuration menu - View commit details
-
Copy full SHA for 01aa788 - Browse repository at this point
Copy the full SHA 01aa788View commit details
Commits on Mar 28, 2025
-
server: organize error types (#9465)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 0bd0454 - Browse repository at this point
Copy the full SHA 0bd0454View commit details
Commits on Mar 31, 2025
-
Configuration menu - View commit details
-
Copy full SHA for 071a987 - Browse repository at this point
Copy the full SHA 071a987View commit details -
ollamarunner: Ensure batch size limits are not exceeded
With the llama runner, we can generate up to NUM_PARALLEL batches at once, which will then get broken up to into individual batches to get executed by llama.cpp (i.e. we add up to 2048 tokens and this gets split into 4 batches of 512 tokens at default settings). This splitting can improve parallelism on multi-GPU systems because the individual batches can move though the pipeline without blocking on the first one to fully complete. However, we don't yet support this in the Ollama runner, partially because it makes it hard to enforce model-specified batch constraints, which didn't exist previously. The result is that we will try to execute the full, unsplit batch. This could result in out of memory or insufficient KV cache space errors. This triggers batch breaking when the total inputs from all sequences exceeds the batch size, rather than per-sequence. In order to ensure fairness, it also reintroduces round-robinning around sequences so that we don't let one busy sequence starve the others.
Configuration menu - View commit details
-
Copy full SHA for 5d09727 - Browse repository at this point
Copy the full SHA 5d09727View commit details -
runner: Release semaphore and improve error messages on failures
If we have an error after creating a new sequence but before finding a slot for it, we return without releasing the semaphore. This reduces our parallel sequences and eventually leads to deadlock. In practice this should never happen because once we have acquired the semaphore, we should always be able to find a slot. However, the code is clearly not correct.
Configuration menu - View commit details
-
Copy full SHA for b2a4652 - Browse repository at this point
Copy the full SHA b2a4652View commit details -
server/internal/client/ollama: cache completed chunks (#9933)
This change adds tracking of download chunks during the pull process so that subsequent pulls can skip downloading already completed chunks. This works across restarts of ollama. Currently, download state will be lost if a prune is triggered during a pull (e.g. restart or remove). This issue should be addressed in a follow-up PR.
Configuration menu - View commit details
-
Copy full SHA for ef27d52 - Browse repository at this point
Copy the full SHA ef27d52View commit details -
runner: clear cache when shift is not possible (#9433)
Clear KV cache when shift operation is not supported by model. Added KvCacheCanShift() check to handle models that can't perform cache shifts, falling back to full cache clear while preserving logical token history to maintain expected behavior when context window fills up.
Configuration menu - View commit details
-
Copy full SHA for 66b2539 - Browse repository at this point
Copy the full SHA 66b2539View commit details
Commits on Apr 1, 2025
-
discover: /proc/cpuinfo file open and close. (#9950)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
Configuration menu - View commit details
-
Copy full SHA for 4059a29 - Browse repository at this point
Copy the full SHA 4059a29View commit details -
docs: add DeepShell to community projects (#9955)
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 23fc8e9 - Browse repository at this point
Copy the full SHA 23fc8e9View commit details -
Configuration menu - View commit details
-
Copy full SHA for c001b98 - Browse repository at this point
Copy the full SHA c001b98View commit details -
api: return model capabilities from the show endpoint (#10066)
With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.
Configuration menu - View commit details
-
Copy full SHA for e172f09 - Browse repository at this point
Copy the full SHA e172f09View commit details
Commits on Apr 2, 2025
-
Configuration menu - View commit details
-
Copy full SHA for 4e41502 - Browse repository at this point
Copy the full SHA 4e41502View commit details -
chore(all): replace instances of interface with any (#10067)
Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.
Configuration menu - View commit details
-
Copy full SHA for 9876c9f - Browse repository at this point
Copy the full SHA 9876c9fView commit details -
ollamarunner: Don't truncate a SameBatch
When truncating inputs to the the context window at the beginning of a sequence, we remove the minimum amount possible. However, this may cause us to truncate to the middle of a set of inputs that the model specified should not be split up. To avoid this, we need to remove the rest of the partial batch.
Configuration menu - View commit details
-
Copy full SHA for 493385e - Browse repository at this point
Copy the full SHA 493385eView commit details -
kvcache: Add check for values that fall out of sliding window cache
The sliding window cache trims entries that are outside the window for the latest token. This works when we are extending the cache, such as when the conversation continues. However, if we have a partial overlap in conversation (including the BOS tokens), then we resume from a past point in the conversation and the needed tokens are no longer stored in memory. This verifies that the new window overlaps with the old one before reusing the cache. Co-authored-by: Jesse Gross <jesse@ollama.com>
Configuration menu - View commit details
-
Copy full SHA for b429700 - Browse repository at this point
Copy the full SHA b429700View commit details -
Configuration menu - View commit details
-
Copy full SHA for b51e0f3 - Browse repository at this point
Copy the full SHA b51e0f3View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v0.6.3...v0.6.4