cache: keep cached suggestions #18

VJHack · 2025-01-03T04:45:05Z

This PR aims to optimize the performance of the cache such that when the user types the same letter as the current cached suggestion, we keep the suggestion displayed instead of going to the server to fetch a new FIM completion.

Here's how it works:
The initial completion shown below is cached.

As the user continues typing out the current suggestion, we scan back 10 characters to see if there is a cached suggestion nearby. If the cached suggestion matches what the user typed, it is kept. This approach works better than simply checking to see if the previous character is cached because if the user types fast enough llama#fim() will not get called and will result in a cache miss.

Changes in this PR:

Modified the format of the cache key from l:prefix . "|" . l:suffix . "|" . l:prompt to l:prefix . l:prompt . l:suffix. It seems more intuitive to keep the prompt in the middle with the prefix and suffix on either side.
Created a separate function to insert items into the cache.
Search for cached values nearby for zero-latency suggestions.

Fixes #16

ggerganov · 2025-01-03T07:48:08Z

Very nice!

One small improvement - when triggering manual FIM using Ctrl+F, I think we should not hit the cache and instead always send a request to the server.

VJHack · 2025-01-03T16:23:28Z

@ggerganov I disabled the cache when we manually trigger FIM using Ctrl+F. However I noticed that the user has to press it twice when a suggestion is already displayed to generate a new suggestion. I don't entirely agree with this logic and I'm not sure what the intention was. I left it as is.

llama.vim/autoload/llama.vim

Line 344 in 3a08e7d

if s:hint_shown && !a:is_auto

It seems to work as expected. Thanks for the review 👍

ggerganov · 2025-01-04T08:04:41Z

However I noticed that the user has to press it twice when a suggestion is already displayed to generate a new suggestion. I don't entirely agree with this logic and I'm not sure what the intention was. I left it as is.

The logic is to make Ctrl+F to act as a toggle so that you can turn of the current llama.vim suggestion. This is useful if you have another auto-completion plugin that you can toggle at the same position to compare the results for example.

ggerganov

It would be a nice to update the info message to provide cache information such as how many entries there are currently cached. For example, in this case after hitting the cache, there is no need to print all the token generation stats again:

Instead the info could look like:

... world\n");     | C: 3 / 250 | t: 0.66 ms

autoload/llama.vim

VJHack · 2025-01-05T06:53:09Z

However I noticed that the user has to press it twice when a suggestion is already displayed to generate a new suggestion. I don't entirely agree with this logic and I'm not sure what the intention was. I left it as is.

The logic is to make Ctrl+F to act as a toggle so that you can turn of the current llama.vim suggestion. This is useful if you have another auto-completion plugin that you can toggle at the same position to compare the results for example.

Oh right. That makes sense. Thank you!

VJHack added 3 commits January 2, 2025 22:08

check nearby cached completions

b113650

coding style

64434e9

coding style

aa10097

VJHack mentioned this pull request Jan 3, 2025

cache: Keep cached suggestion when user types same letters #16

Closed

ggerganov mentioned this pull request Jan 3, 2025

llama.vim : better request throttling mechanism #19

Merged

Disable cache when gen new suggestion with CTRL+F

e5f3160

removed echom

c5181ad

ggerganov approved these changes Jan 4, 2025

View reviewed changes

autoload/llama.vim Show resolved Hide resolved

ggerganov merged commit d13d932 into ggml-org:master Jan 4, 2025

VJHack mentioned this pull request Jan 5, 2025

info: cached info message #21

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cache: keep cached suggestions #18

cache: keep cached suggestions #18

Uh oh!

VJHack commented Jan 3, 2025 •

edited

Loading

Uh oh!

ggerganov commented Jan 3, 2025

Uh oh!

VJHack commented Jan 3, 2025

Uh oh!

ggerganov commented Jan 4, 2025

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

VJHack commented Jan 5, 2025

Uh oh!

Uh oh!

cache: keep cached suggestions #18

cache: keep cached suggestions #18

Uh oh!

Conversation

VJHack commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Jan 3, 2025

Uh oh!

VJHack commented Jan 3, 2025

Uh oh!

ggerganov commented Jan 4, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VJHack commented Jan 5, 2025

Uh oh!

Uh oh!

VJHack commented Jan 3, 2025 •

edited

Loading