Skip to content

cache: keep cached suggestions #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 4, 2025

Conversation

VJHack
Copy link
Collaborator

@VJHack VJHack commented Jan 3, 2025

This PR aims to optimize the performance of the cache such that when the user types the same letter as the current cached suggestion, we keep the suggestion displayed instead of going to the server to fetch a new FIM completion.

Here's how it works:
The initial completion shown below is cached.
Screen Shot 2025-01-02 at 10 35 43 PM

As the user continues typing out the current suggestion, we scan back 10 characters to see if there is a cached suggestion nearby. If the cached suggestion matches what the user typed, it is kept. This approach works better than simply checking to see if the previous character is cached because if the user types fast enough llama#fim() will not get called and will result in a cache miss.
Screen Shot 2025-01-02 at 10 43 31 PM

Changes in this PR:

  • Modified the format of the cache key from l:prefix . "|" . l:suffix . "|" . l:prompt to l:prefix . l:prompt . l:suffix. It seems more intuitive to keep the prompt in the middle with the prefix and suffix on either side.
  • Created a separate function to insert items into the cache.
  • Search for cached values nearby for zero-latency suggestions.

Fixes #16

@ggerganov
Copy link
Member

Very nice!

One small improvement - when triggering manual FIM using Ctrl+F, I think we should not hit the cache and instead always send a request to the server.

@VJHack
Copy link
Collaborator Author

VJHack commented Jan 3, 2025

@ggerganov I disabled the cache when we manually trigger FIM using Ctrl+F. However I noticed that the user has to press it twice when a suggestion is already displayed to generate a new suggestion. I don't entirely agree with this logic and I'm not sure what the intention was. I left it as is.

if s:hint_shown && !a:is_auto

It seems to work as expected. Thanks for the review 👍

@ggerganov
Copy link
Member

However I noticed that the user has to press it twice when a suggestion is already displayed to generate a new suggestion. I don't entirely agree with this logic and I'm not sure what the intention was. I left it as is.

The logic is to make Ctrl+F to act as a toggle so that you can turn of the current llama.vim suggestion. This is useful if you have another auto-completion plugin that you can toggle at the same position to compare the results for example.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a nice to update the info message to provide cache information such as how many entries there are currently cached. For example, in this case after hitting the cache, there is no need to print all the token generation stats again:

image

Instead the info could look like:

... world\n");     | C: 3 / 250 | t: 0.66 ms

@ggerganov ggerganov merged commit d13d932 into ggml-org:master Jan 4, 2025
@VJHack
Copy link
Collaborator Author

VJHack commented Jan 5, 2025

However I noticed that the user has to press it twice when a suggestion is already displayed to generate a new suggestion. I don't entirely agree with this logic and I'm not sure what the intention was. I left it as is.

The logic is to make Ctrl+F to act as a toggle so that you can turn of the current llama.vim suggestion. This is useful if you have another auto-completion plugin that you can toggle at the same position to compare the results for example.

Oh right. That makes sense. Thank you!

@VJHack VJHack mentioned this pull request Jan 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cache: Keep cached suggestion when user types same letters
2 participants