-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
Closed
Labels
feature requestNew feature or requestNew feature or request
Description
🚀 The feature, motivation and pitch
VLLM has announced support for running llama3.1-405b-fp8 on 8xA100. This is the blog
Does vllm support running DeepSeek-Coder-V2-Instruct-FP8 on 8xA100?
However, I notice that vLLM uses Triton for its FusedMoE kernel, which doesn't support the FP8 Marlin mixed-precision. See sgl-project/sglang#989 (comment)
Is there any work around?
Alternatives
No response
Additional context
No response
halexan
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request