-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Checklist
- 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 2. Please use English, otherwise it will be closed.
Description
Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. For more information about Ascend, see Ascend Community.
Pytorch has officially announced support for Ascend NPU (through key PrivateUse1), please see the PrivateUse1 tutorial here.
Motivation
Currently, the number of developers using Ascend NPU for AI training and inferencing has been significantly increasing. And many popular open-source projects have already supported Ascend, such as, LLaMA-Factory, llama.cpp, DeepSpeed. Some users of sglang want to run it on Ascend (see #3609). Therefore, I would like to add support for Ascend NPU backend for sglang.
Status
Pytorch already support npu, but OpenAI Triton doesn't support npu for now which is under development. It should works with the torch_native attention backend. When triton is ready, it should works with the triton backend.
We have successfully run sglang on the x86 Ascend platform with torch_native backend. Here is the running log:
Related PR
sglang