Generate endpoint intermittently misses final token before done

### What is the issue?

When using the generate endpoint it intermittently misses the last token right before the "done" message

```JSON
{"model":"adrienbrault/nous-hermes2theta-llama3-8b:q8_0","created_at":"2024-09-09T08:04:47.463348938Z","response":" Bear","done":false}
{"model":"adrienbrault/nous-hermes2theta-llama3-8b:q8_0","created_at":"2024-09-09T08:04:47.475993178Z","response":",","done":false}
{"model":"adrienbrault/nous-hermes2theta-llama3-8b:q8_0","created_at":"2024-09-09T08:04:47.488651949Z","response":" Elephant","done":false}
{"model":"adrienbrault/nous-hermes2theta-llama3-8b:q8_0","created_at":"2024-09-09T08:04:47.50131158Z","response":",","done":false}
{"model":"adrienbrault/nous-hermes2theta-llama3-8b:q8_0","created_at":"2024-09-09T08:04:47.51400078Z","response":" Gor","done":false}
{"model":"adrienbrault/nous-hermes2theta-llama3-8b:q8_0","created_at":"2024-09-09T08:04:47.539481043Z","response":"","done":true,"done_reason":"stop","total_duration":8790953777,"load_duration":8080650494,"
```

In the above example the token that should be the end of "Gorilla" is not emitted before the done response and we just get "Gor". 

here's the curl command to reproduce this

```sh 
curl -H 'Host: 127.0.0.1:11434' -H 'Content-Type: application/json' -H 'Connection: Keep-Alive' --compressed -H 'Accept-Language: en-GB,*' -H 'User-Agent: Mozilla/5.0' -X POST http://127.0.0.1:11434/a
pi/generate -d '{"model": "adrienbrault/nous-hermes2theta-llama3-8b:q8_0", "prompt": "\n<|im_start|>user\nYou will think of a number. Then you will list that many animals. Do not write any other words only 
the animal. Be terse in your response.<|im_end|>\n<|im_start|>assistant", "raw": true, "stream": true, "keep_alive": -1, "options": {"seed": 99, "num_predict": 1024, "num_ctx": 4096, "stop": ["<end>", "user
:", "assistant:"], "num_batch": 1, "temperature": 0.5, "top_k": 40, "top_p": 0.9}}'
```


I have only seen this with one model so far (adrienbrault/nous-hermes2theta-llama3-8b:q8_0) so the model may well be a factor however I don't get this problem with the chat endpoint for that model but I do get it with the generate endpoint. I'm using raw mode and stream=true 


### OS

Linux

### GPU

Nvidia

### CPU

AMD

### Ollama version

0.3.9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generate endpoint intermittently misses final token before done #6707

What is the issue?

OS

GPU

CPU

Ollama version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generate endpoint intermittently misses final token before done #6707

Description

What is the issue?

OS

GPU

CPU

Ollama version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions