Skip to content

server : ability to disable context shift #9390

@ngxson

Description

@ngxson

Feature Description

We can add an argument (for example, --context-shift, --no-context-shift) to enable/disable context shift.

If disabled:

  • Requests bigger than context window will result in an error.
  • n_predict for each sequence will be capped to n_ctx - n_tokens_prompt

Note: the behavior above is the same as official OAI API

Motivation

We may want to disable it because:

  • For users who doesn't know about this feature, it may degrade generation quality
  • Currently, quantized KV cache doesn't work with context shift

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions