Skip to content

Conversation

MollySophia
Copy link
Collaborator

The token-shifting part was not correctly done in the previous implementation. It wasn't copied back to k_cache after a decode. As a result, the model was always lerping towards zero when decoding. Prefill(and as a result, PPL evaluation) wasn't affected.

Somehow this mistake didn't affect much on text generation as well lol (maybe the large 32B model already got enough context information into the wkv state?). That's why the bug wasn't found previously.

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Animaxx added a commit to Animaxx/llama.cpp that referenced this pull request Jan 28, 2025
@MollySophia MollySophia merged commit 325afb3 into ggml-org:master Jan 29, 2025
45 checks passed
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants