-
Notifications
You must be signed in to change notification settings - Fork 65
cache: keep cached suggestions #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Very nice! One small improvement - when triggering manual FIM using Ctrl+F, I think we should not hit the cache and instead always send a request to the server. |
@ggerganov I disabled the cache when we manually trigger FIM using Ctrl+F. However I noticed that the user has to press it twice when a suggestion is already displayed to generate a new suggestion. I don't entirely agree with this logic and I'm not sure what the intention was. I left it as is. Line 344 in 3a08e7d
It seems to work as expected. Thanks for the review 👍 |
The logic is to make Ctrl+F to act as a toggle so that you can turn of the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a nice to update the info message to provide cache information such as how many entries there are currently cached. For example, in this case after hitting the cache, there is no need to print all the token generation stats again:
Instead the info could look like:
... world\n"); | C: 3 / 250 | t: 0.66 ms
Oh right. That makes sense. Thank you! |
This PR aims to optimize the performance of the cache such that when the user types the same letter as the current cached suggestion, we keep the suggestion displayed instead of going to the server to fetch a new FIM completion.
Here's how it works:

The initial completion shown below is cached.
As the user continues typing out the current suggestion, we scan back 10 characters to see if there is a cached suggestion nearby. If the cached suggestion matches what the user typed, it is kept. This approach works better than simply checking to see if the previous character is cached because if the user types fast enough

llama#fim()
will not get called and will result in a cache miss.Changes in this PR:
l:prefix . "|" . l:suffix . "|" . l:prompt
tol:prefix . l:prompt . l:suffix
. It seems more intuitive to keep the prompt in the middle with the prefix and suffix on either side.Fixes #16