[Bugfix] Fix Mistral ChatCompletionRequest Body Exception #16769

JasmondL · 2025-04-17T09:00:36Z

Bugfix for Mistral tool validation in vLLM

Issue

This pull request aims to resolve a validation error in the ChatCompletionRequest's Body for Mistral Models. The assumption that the mistral-common library throws an AssertionError is incorrect, here, as it actually contains a series of exception classes, as documented in the mistral-common repository, here.

The issue arises when a user sends an inference call with a message containing both content and tool_calls. The mistral tokenizer throws an exception, which is not handled, resulting in a 500 Internal Server Error being returned. Instead, a 400 Bad Request status code should be returned.

The following is an invalid message structure example.

"messages": [
    {
      "role": "assistant",
      "content": "What is the Weather today?",
      "tool_calls": [
        {
          "id": "call-243240c8",
          "type": "function",
          "function": {
            "name": "current_weather",
            "arguments": "{\"county\": \"somewhere\"}"
          }
        }
      ]
    }
]

Current Behaviour of vLLM

Upon receiving the above invalid message structure inference call, the following exception is asserted. As a result, an http status code of 500 is returned.

  File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 347, in create_chat_completion
    generator = await handler.create_chat_completion(request, raw_request)
  File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 155, in create_chat_completion
    ) = await self._preprocess_chat(
  File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_engine.py", line 458, in _preprocess_chat
    request_prompt = apply_mistral_chat_template(
  File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/chat_utils.py", line 767, in apply_mistral_chat_template
    return tokenizer.apply_chat_template(
  File "/usr/local/lib/python3.10/site-packages/vllm/transformers_utils/tokenizers/mistral.py", line 265, in apply_chat_template
    encoded = self.mistral.encode_chat_completion(request)
  File "/usr/local/lib/python3.10/site-packages/mistral_common/tokens/tokenizers/mistral.py", line 230, in encode_chat_completion
    validated_request = self._chat_completion_request_validator.validate_request(request)
  File "/usr/local/lib/python3.10/site-packages/mistral_common/protocol/instruct/validator.py", line 63, in validate_request
    self.validate_messages(request.messages)
  File "/usr/local/lib/python3.10/site-packages/mistral_common/protocol/instruct/validator.py", line 51, in validate_messages
    self._validate_message_list_content(messages)
  File "/usr/local/lib/python3.10/site-packages/mistral_common/protocol/instruct/validator.py", line 273, in _validate_message_list_content
    self._validate_assistant_message(message, is_last_message=idx == len(messages) - 1)
  File "/usr/local/lib/python3.10/site-packages/mistral_common/protocol/instruct/validator.py", line 147, in _validate_assistant_message
    raise InvalidAssistantMessageException(
mistral_common.exceptions.InvalidAssistantMessageException: Assistant message must have either content or tool_calls, but not both.
/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py:147: RuntimeWarning: coroutine 'AsyncMultiModalItemTracker.all_mm_data' was never awaited
  async with send_stream:
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
WARNING 04-11 08:53:17 chat_utils.py:601] Skipping multimodal part (type: 'text')with empty / unparsable content.
INFO:     127.0.0.6:42325 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

Expected Behaviour

It should not throw any exceptions and handle them as a 400 Bad Request.

github-actions · 2025-04-17T09:00:48Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/third_party/pynvml.py

vllm/transformers_utils/tokenizers/mistral.py

mgoin · 2025-04-17T12:56:26Z

cc @patrickvonplaten

JasmondL · 2025-04-24T03:45:44Z

Hi @gcalmettes,

Could you review the changes in this PR?

JasmondL · 2025-04-24T07:04:14Z

The following exception is also being observed during inferencing. Hence, the exception handling of apply_hf_chat_template is the issue. The problem is caused by the user sending consecutive messages with the role set to "user".

The latest commit added exception handling to apply_hf_chat_template, similar to that implemented in apply_mistral_chat_template.

  File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in add_request_id
    response = await call_next(request)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 163, in call_next
    raise app_exc
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/base.py", line 149, in coro
    await self.app(scope, receive_or_disconnect, send_no_error)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
    response = await f(request)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 347, in create_chat_completion
    generator = await handler.create_chat_completion(request, raw_request)
  File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 155, in create_chat_completion
    ) = await self._preprocess_chat(
  File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_engine.py", line 464, in _preprocess_chat
    request_prompt = apply_hf_chat_template(
  File "/usr/local/lib/python3.10/site-packages/vllm/entrypoints/chat_utils.py", line 741, in apply_hf_chat_template
    return tokenizer.apply_chat_template(
  File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1683, in apply_chat_template
    rendered_chat = compiled_template.render(
  File "/usr/local/lib/python3.10/site-packages/jinja2/environment.py", line 1304, in render
    self.environment.handle_exception()
  File "/usr/local/lib/python3.10/site-packages/jinja2/environment.py", line 939, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 11, in top-level template code
  File "/usr/local/lib/python3.10/site-packages/jinja2/sandbox.py", line 394, in call
    return __context.call(__obj, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/utils/chat_template_utils.py", line 412, in raise_exception
    raise jinja2.exceptions.TemplateError(message)
jinja2.exceptions.TemplateError: After the optional system message, conversation roles must alternate user/assistant/user/assistant/...
WARNING 04-24 00:00:10 chat_utils.py:601] Skipping multimodal part (type: 'text')with empty / unparsable content.
INFO:     127.0.0.6:50633 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application