Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: ollama/ollama
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v0.10.1
Choose a base ref
...
head repository: ollama/ollama
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v0.11.0
Choose a head ref
  • 19 commits
  • 58 files changed
  • 4 contributors

Commits on Jul 31, 2025

  1. kvcache: Enable SWA to retain additional entries

    Models that use sliding window attention can only resume a sequence
    from the cache if it falls within the saved windows. This works well
    if the next message picks up where the old one left off. However, it
    generally prevents a partial prefix match unless the entire conversation
    falls within the sliding window.
    
    This can be a problem with reasoning models where the traces are
    supposed to be removed from future messages, forcing the entire
    history to be re-evaluated.
    
    This change allows models to specify that a larger amount of the
    history be retained in memory, to allow more partial resumption.
    It still respects the window that the model was trained on for
    token generation.
    jessegross committed Jul 31, 2025
    Configuration menu
    Copy the full SHA
    4183bb0 View commit details
    Browse the repository at this point in the history
  2. bf16

    mxyng committed Jul 31, 2025
    Configuration menu
    Copy the full SHA
    4a8fc3f View commit details
    Browse the repository at this point in the history
  3. tests

    mxyng committed Jul 31, 2025
    Configuration menu
    Copy the full SHA
    f1c7384 View commit details
    Browse the repository at this point in the history

Commits on Aug 4, 2025

  1. gpt-oss

    mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    9950f6e View commit details
    Browse the repository at this point in the history
  2. enable gptoss for engine

    mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    26ade3a View commit details
    Browse the repository at this point in the history
  3. rough estimate

    mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    6ca094a View commit details
    Browse the repository at this point in the history
  4. convert to mxfp4

    mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    c8ac4cc View commit details
    Browse the repository at this point in the history
  5. handle safetensors U8

    mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    9679520 View commit details
    Browse the repository at this point in the history
  6. clamp glu/linear

    mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    9d1de41 View commit details
    Browse the repository at this point in the history
  7. update tokenizer

    mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    9194874 View commit details
    Browse the repository at this point in the history
  8. MXFP4 support

    This implements the Open Compute Microscaling (MX) FP4 format
    as a tensor type with backend implementations focusing
    on mulmat and mulmatid on CPU, CUDA, and Metal.
    dhiltgen authored and mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    4fb47ed View commit details
    Browse the repository at this point in the history
  9. Unit tests for MXFP4 support

    This exercises various operations and shapes on both CPU and GPU (if detected
    on the system)
    dhiltgen authored and mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    0263ad9 View commit details
    Browse the repository at this point in the history
  10. cuda graph

    mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    e6f39bc View commit details
    Browse the repository at this point in the history
  11. unit test adjustments

    dhiltgen authored and mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    0ac1c0d View commit details
    Browse the repository at this point in the history
  12. cuda: optimize memory access

    Read 4 bytes at a time (8 elements) when performing mul_mat_vec_mxfp4
    dhiltgen authored and mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    aa43da4 View commit details
    Browse the repository at this point in the history
  13. mac: fix crash on old macos versions

    cblas_sgemm is only supported on v13.3 and up, however bf16 is
    only supported on v14+ so we were falling back to ggml-blas and
    crashing on bf16 tensors.  Checking for the function being null
    seems to be the simplest way to condittionally avoid registering the
    backend.
    dhiltgen authored and mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    6a68a17 View commit details
    Browse the repository at this point in the history
  14. server: Minimum context length for gptoss

    This model requires a minimum context length of 8192 to function
    effectively. Users can set higher values through all normal mechanisms
    but lower values will be silently reset.
    jessegross authored and mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    f5fd7cc View commit details
    Browse the repository at this point in the history
  15. ggml: Multiply by numParallel for gptoss sliding window

    When computing the graph size estimate, the context size is already
    multiplied by numParallel so estimates reflect that. However, since
    sliding window models use a smaller, fixed context size, they need
    to manually take numParallel into account.
    jessegross authored and mxyng committed Aug 4, 2025
    Configuration menu
    Copy the full SHA
    8306248 View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2025

  1. gpt-oss integration

    includes harmony parser and thinking levels, etc.
    drifkin committed Aug 5, 2025
    Configuration menu
    Copy the full SHA
    d552068 View commit details
    Browse the repository at this point in the history
Loading