Add support for vision models to HF LLM pipeline

Add support for the `image-text-to-text` task in the HF LLM pipeline. This will enable image models such as the following call.

```python
model = LLM("...")
result = model([{
    "role": "user",
    "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image", "image": "books.jpg"}
    ]}
])
```