Skip to content

Conversation

VJHack
Copy link
Collaborator

@VJHack VJHack commented Dec 27, 2024

This PR implements a cache of the completions for a given local state to reduce the number of calls made to the server. The FIM code completions are cached in g:result_cache where the key is a sha256 hash of the concatenated string l:prefix . "|" . l:suffix . "|" . l:prompt and the value is the raw cached completion.

  • The cache is a dictionary that holds 250 entries by default and can be configured in llama_config
  • The generated suggestion along with additional info is cached.
  • For simplicity, the cache currently implements a random eviction policy. A more optimal strategy like LRU can be used to improve the cache hit rate.

Fixes #3

@ggerganov
Copy link
Member

Does not work correctly with Neovim:

image

@VJHack
Copy link
Collaborator Author

VJHack commented Dec 30, 2024

Does not work correctly with Neovim:

image

Fixed and tested the bug with neovim. It should be working now.
Screen Shot 2024-12-30 at 11 05 39 AM

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work ok now 👍

@ggerganov
Copy link
Member

Maybe we can use this cache mechanism to create a second set of hashes just for the current completion, so that when the user is typing the same letters as the suggestion, we do not send requests to the server and simply reuse the results from the cache.

@ggerganov ggerganov merged commit 3a08e7d into ggml-org:master Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

llama.vim : cache completions client-side
2 participants