-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Add xpu_cmake_macros.h to xpu build #132847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/132847
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 88f473d with merge base c184ac0 ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Overall, it looks good to me. But I think the PR description is not the motivation. We need to describe what the changes serve for. If it is an issue to support extension mechanism, I'd prefer to submit an issue and link to the issue number. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot cherry-pick --onto release/2.4 -c critical --fixes #132971 |
# Motivation fix #132971 Pull Request resolved: #132847 Approved by: https://github.com/EikanWang (cherry picked from commit 9c5e0d4)
Cherry picking #132847The cherry pick PR is at #133649 and it is linked with issue #132971. The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
This commit implements xpu extension with unpack kernels written in sycl. Pytorch XPU backend provides hw acceleration on Intel GPUs. At the moment Meteor Lake (MTL) and Data Center Max (PVC) are supported. Provided sycl kernel was converted from existing cuda kernel. $ python bench/kernels/benchmark.py --it 1000 unpack_2bit[xpu]: python = 0.177 ms, ext = 0.033 ms, ratio = 5.4x unpack_4bit[xpu]: python = 0.085 ms, ext = 0.026 ms, ratio = 3.3x note: without extension ratio is 0.8x. At the moment there are few not implemented features for xpu backend which affect implementation. These are: * pytorch/pytorch#127929 * Some memory ops not supported by xpu backend * WA applied: calling these ops is commented out * pytorch/pytorch#131840 * elapsed_time is not supported by XPUEvent * WA applied: calling these ops is commented out (CPU e2e time is measured) * pytorch/pytorch#132947 * Some aten ops are not implemented with xpu backend falling back to cpu * WA required: set PYTORCH_ENABLE_XPU_FALLBACK=1 on cmdline Requires: pytorch/pytorch#132847 Requires: pytorch/pytorch#132945 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Add xpu_cmake_macros.h to xpu build (#132847) # Motivation fix #132971 Pull Request resolved: #132847 Approved by: https://github.com/EikanWang (cherry picked from commit 9c5e0d4) Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
Fixes: pytorch#132944 This patch adds support for sycl kernels build via torch.utils.cpp_extension.load API. Files having .sycl extension are considered to have sycl kernels and are compiled with icpx (dpc++ sycl compiler from Intel). Files with other extensions, .cpp, .cu, are handled as before. API supports building sycl along with other file types into single extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment "pvc,xe-lpg". This behavior can be overriden by setting TORCH_XPU_ARCH_LIST environment variables to the comma separated list of desired devices to compile for. Requires: pytorch#132847 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Fixes: pytorch#132944 This patch adds support for sycl kernels build via torch.utils.cpp_extension.load API. Files having .sycl extension are considered to have sycl kernels and are compiled with icpx (dpc++ sycl compiler from Intel). Files with other extensions, .cpp, .cu, are handled as before. API supports building sycl along with other file types into single extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment "pvc,xe-lpg". This behavior can be overriden by setting TORCH_XPU_ARCH_LIST environment variables to the comma separated list of desired devices to compile for. Requires: pytorch#132847 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Fixes: #132944 This patch adds support for sycl kernels build via torch.utils.cpp_extension.load API. Files having .sycl extension are considered to have sycl kernels and are compiled with icpx (dpc++ sycl compiler from Intel). Files with other extensions, .cpp, .cu, are handled as before. API supports building sycl along with other file types into single extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment "pvc,xe-lpg". This behavior can be overriden by setting TORCH_XPU_ARCH_LIST environment variables to the comma separated list of desired devices to compile for. Requires: #132847 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Fixes: #132944 This patch adds support for sycl kernels build via torch.utils.cpp_extension.load API. Files having .sycl extension are considered to have sycl kernels and are compiled with icpx (dpc++ sycl compiler from Intel). Files with other extensions, .cpp, .cu, are handled as before. API supports building sycl along with other file types into single extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment "pvc,xe-lpg". This behavior can be overriden by setting TORCH_XPU_ARCH_LIST environment variables to the comma separated list of desired devices to compile for. Requires: #132847 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Fixes: #132944 This patch adds support for sycl kernels build via torch.utils.cpp_extension.load API. Files having .sycl extension are considered to have sycl kernels and are compiled with icpx (dpc++ sycl compiler from Intel). Files with other extensions, .cpp, .cu, are handled as before. API supports building sycl along with other file types into single extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment "pvc,xe-lpg". This behavior can be overriden by setting TORCH_XPU_ARCH_LIST environment variables to the comma separated list of desired devices to compile for. Requires: #132847 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Fixes: #132944 This patch adds support for sycl kernels build via torch.utils.cpp_extension.load API. Files having .sycl extension are considered to have sycl kernels and are compiled with icpx (dpc++ sycl compiler from Intel). Files with other extensions, .cpp, .cu, are handled as before. API supports building sycl along with other file types into single extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment "pvc,xe-lpg". This behavior can be overriden by setting TORCH_XPU_ARCH_LIST environment variables to the comma separated list of desired devices to compile for. Requires: #132847 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Fixes: #132944 This patch adds support for sycl kernels build via torch.utils.cpp_extension.load API. Files having .sycl extension are considered to have sycl kernels and are compiled with icpx (dpc++ sycl compiler from Intel). Files with other extensions, .cpp, .cu, are handled as before. API supports building sycl along with other file types into single extension. By default SYCL kernels are compiled for all Intel GPU devices for which pytorch native aten SYCL kernels are compiled. At the moment "pvc,xe-lpg". This behavior can be overriden by setting TORCH_XPU_ARCH_LIST environment variables to the comma separated list of desired devices to compile for. Requires: #132847 Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Stack from ghstack (oldest at bottom):
Motivation
fix #132971
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10