-
Notifications
You must be signed in to change notification settings - Fork 12.8k
Closed
Labels
Description
Feature Description
We can add an argument (for example, --context-shift
, --no-context-shift
) to enable/disable context shift.
If disabled:
- Requests bigger than context window will result in an error.
n_predict
for each sequence will be capped ton_ctx - n_tokens_prompt
Note: the behavior above is the same as official OAI API
Motivation
We may want to disable it because:
- For users who doesn't know about this feature, it may degrade generation quality
- Currently, quantized KV cache doesn't work with context shift
Possible Implementation
No response
VoidIsVoid and VJHack