This seems to be a bug or limitation of the llama-cpp-python backend: https://github.com/abetlen/llama-cpp-python/discussions/257