Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: ollama/ollama
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.6.3
Choose a base ref
...
head repository: ollama/ollama
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.6.4
Choose a head ref
  • 18 commits
  • 56 files changed
  • 12 contributors

Commits on Mar 27, 2025

  1. Configuration menu
    Copy the full SHA
    b816ff8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ead27aa View commit details
    Browse the repository at this point in the history
  3. ml: Remove Output from Context interface

    Model implementations should use Input for all of their tensors
    supplied to the model. This includes tensors that relate to the
    outputs, which is confusing since there is also an Output funciton.
    
    Since Output is only used internally in GGML and not used by any
    model implementations, we can remove it from the interface to
    reduce confusion.
    jessegross committed Mar 27, 2025
    Configuration menu
    Copy the full SHA
    01aa788 View commit details
    Browse the repository at this point in the history

Commits on Mar 28, 2025

  1. server: organize error types (#9465)

    Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
    googs1025 and BruceMacD authored Mar 28, 2025
    Configuration menu
    Copy the full SHA
    0bd0454 View commit details
    Browse the repository at this point in the history

Commits on Mar 31, 2025

  1. Configuration menu
    Copy the full SHA
    071a987 View commit details
    Browse the repository at this point in the history
  2. ollamarunner: Ensure batch size limits are not exceeded

    With the llama runner, we can generate up to NUM_PARALLEL batches
    at once, which will then get broken up to into individual batches
    to get executed by llama.cpp (i.e. we add up to 2048 tokens and
    this gets split into 4 batches of 512 tokens at default settings).
    
    This splitting can improve parallelism on multi-GPU systems because
    the individual batches can move though the pipeline without blocking
    on the first one to fully complete. However, we don't yet support
    this in the Ollama runner, partially because it makes it hard to
    enforce model-specified batch constraints, which didn't exist
    previously.
    
    The result is that we will try to execute the full, unsplit batch.
    This could result in out of memory or insufficient KV cache space
    errors.
    
    This triggers batch breaking when the total inputs from all sequences
    exceeds the batch size, rather than per-sequence. In order to ensure
    fairness, it also reintroduces round-robinning around sequences so
    that we don't let one busy sequence starve the others.
    jessegross committed Mar 31, 2025
    Configuration menu
    Copy the full SHA
    5d09727 View commit details
    Browse the repository at this point in the history
  3. runner: Release semaphore and improve error messages on failures

    If we have an error after creating a new sequence but before
    finding a slot for it, we return without releasing the semaphore.
    This reduces our parallel sequences and eventually leads to deadlock.
    
    In practice this should never happen because once we have acquired
    the semaphore, we should always be able to find a slot. However, the
    code is clearly not correct.
    jessegross committed Mar 31, 2025
    Configuration menu
    Copy the full SHA
    b2a4652 View commit details
    Browse the repository at this point in the history
  4. server/internal/client/ollama: cache completed chunks (#9933)

    This change adds tracking of download chunks during the pull process so
    that subsequent pulls can skip downloading already completed chunks.
    This works across restarts of ollama.
    
    Currently, download state will be lost if a prune is triggered during a
    pull (e.g. restart or remove). This issue should be addressed in a
    follow-up PR.
    bmizerany authored Mar 31, 2025
    Configuration menu
    Copy the full SHA
    ef27d52 View commit details
    Browse the repository at this point in the history
  5. runner: clear cache when shift is not possible (#9433)

    Clear KV cache when shift operation is not supported by model.
    Added KvCacheCanShift() check to handle models that can't perform cache shifts,
    falling back to full cache clear while preserving logical token history to
    maintain expected behavior when context window fills up.
    BruceMacD authored Mar 31, 2025
    Configuration menu
    Copy the full SHA
    66b2539 View commit details
    Browse the repository at this point in the history

Commits on Apr 1, 2025

  1. discover: /proc/cpuinfo file open and close. (#9950)

    Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
    zhanluxianshen authored Apr 1, 2025
    Configuration menu
    Copy the full SHA
    4059a29 View commit details
    Browse the repository at this point in the history
  2. docs: add DeepShell to community projects (#9955)

    Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
    Abyss-c0re and BruceMacD authored Apr 1, 2025
    Configuration menu
    Copy the full SHA
    23fc8e9 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    c001b98 View commit details
    Browse the repository at this point in the history
  4. api: return model capabilities from the show endpoint (#10066)

    With support for multimodal models becoming more varied and common it is important for clients to be able to easily see what capabilities a model has. Retuning these from the show endpoint will allow clients to easily see what a model can do.
    BruceMacD authored Apr 1, 2025
    Configuration menu
    Copy the full SHA
    e172f09 View commit details
    Browse the repository at this point in the history

Commits on Apr 2, 2025

  1. Configuration menu
    Copy the full SHA
    4e41502 View commit details
    Browse the repository at this point in the history
  2. chore(all): replace instances of interface with any (#10067)

    Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.
    BruceMacD authored Apr 2, 2025
    Configuration menu
    Copy the full SHA
    9876c9f View commit details
    Browse the repository at this point in the history
  3. ollamarunner: Don't truncate a SameBatch

    When truncating inputs to the the context window at the beginning of
    a sequence, we remove the minimum amount possible. However, this
    may cause us to truncate to the middle of a set of inputs that
    the model specified should not be split up. To avoid this, we
    need to remove the rest of the partial batch.
    jessegross committed Apr 2, 2025
    Configuration menu
    Copy the full SHA
    493385e View commit details
    Browse the repository at this point in the history
  4. kvcache: Add check for values that fall out of sliding window cache

    The sliding window cache trims entries that are outside the window for
    the latest token. This works when we are extending the cache, such as
    when the conversation continues. However, if we have a partial overlap
    in conversation (including the BOS tokens), then we resume from a past
    point in the conversation and the needed tokens are no longer stored
    in memory. This verifies that the new window overlaps with the old one
    before reusing the cache.
    
    Co-authored-by: Jesse Gross <jesse@ollama.com>
    jmorganca and jessegross committed Apr 2, 2025
    Configuration menu
    Copy the full SHA
    b429700 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    b51e0f3 View commit details
    Browse the repository at this point in the history
Loading