Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Last patch apparently introduced some regressions for iGPU users. According to pytorch/pytorch#152317,
torch.xpu.get_device_capability(device)['has_bfloat16_conversions']
only checks if a device supports generating SPIRV BF16 code which is false on Lunar Lake iGPUs which slowed down for those devices. One should be usingtorch.xpu.is_bf16_supported()
which was not documented which mirrors the Nvidia counterpart so use that instead for BF16 type checking which restores most of the speed according to a user who brought this up and did testing on the patch.Also disable
non-blocking
by default on XPU due to it running slower overall on other operations not generation related at this time. But because of speedups on dGPUs, introduce a flag to force it on if needed. That in addition to above restores the speed completely for iGPU users.I also redid the IPEX check to mirror all alternative backends to make it a lot more simple and eschew version juggling. I check first for IPEX import then check XPU availability using the standard available call since it now handles some of the checks the older check did. Documentation was updated to mostly match #7767 changes but also to keep allowing users to install nightly for better performance. Removed discussion thread as things now work out of the box for the most part without any real need to finagle with things using the standard installation procedure.