Skip to content

Conversation

BruceMacD
Copy link
Contributor

Clear KV cache when shift operation is not supported by model. Added KvCacheCanShift() check to handle models that can't perform cache shifts, falling back to full cache clear while preserving logical token history to maintain expected behavior when context window fills up.

Fixes: #5975
Fixes: #8074
Fixes: #8571
Fixes: #8599
Fixes: #8602
Fixes: #8614
Fixes: #8924
Fixes: #9010
Fixes: #9047
Fixes: #9064
Fixes: #9105
Fixes: #9171
Fixes: #9248
Fixes: #9410

@BruceMacD BruceMacD marked this pull request as draft March 3, 2025 23:46
@BruceMacD BruceMacD force-pushed the brucemacd/ctx-shift-err branch from 40faf3c to 68776d9 Compare March 4, 2025 00:10
@BruceMacD BruceMacD marked this pull request as ready for review March 4, 2025 00:10
@BruceMacD BruceMacD requested a review from jessegross March 4, 2025 00:11
@BruceMacD BruceMacD changed the title llamarunner: clear cache when shift is not possible runner: clear cache when shift is not possible Mar 4, 2025
@BruceMacD BruceMacD force-pushed the brucemacd/ctx-shift-err branch from 682ea85 to 9c23f11 Compare March 11, 2025 04:24
@BruceMacD BruceMacD requested a review from jessegross March 11, 2025 04:26
Clear KV cache when shift operation is not supported by model.
Added KvCacheCanShift() check to handle models that can't perform cache shifts,
falling back to full cache clear while preserving logical token history to
maintain expected behavior when context window fills up.
@BruceMacD BruceMacD force-pushed the brucemacd/ctx-shift-err branch from 9c23f11 to 8ac3b75 Compare March 28, 2025 23:53
@BruceMacD BruceMacD requested a review from jessegross March 28, 2025 23:56
@BruceMacD BruceMacD merged commit 66b2539 into main Mar 31, 2025
8 checks passed
@BruceMacD BruceMacD deleted the brucemacd/ctx-shift-err branch March 31, 2025 19:54
halfcrazy pushed a commit to halfcrazy/ollama that referenced this pull request Jun 19, 2025
Clear KV cache when shift operation is not supported by model.
Added KvCacheCanShift() check to handle models that can't perform cache shifts,
falling back to full cache clear while preserving logical token history to
maintain expected behavior when context window fills up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment