[ROCm] Check supported archs before setting preferred blas backend to hipblasLT #128753

jithunnair-amd · 2024-06-14T22:45:28Z

This PR is needed to resolve usability issues with PyTorch ROCm nightly wheels on non-gfx90a/gf94x architectures as a result of #127944.

Addresses #119081 (comment)

With this PR's changes, I get the following on a gfx908 (unsupported by hipblasLT) architecture:

Using setter function:

>>> torch.backends.cuda.preferred_blas_library(backend="cublaslt")
[W617 19:58:58.286088851 Context.cpp:280] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
[W617 19:59:02.125161985 Context.cpp:291] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>

Using TORCH_BLAS_PREFER_HIPBLASLT env var:

root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_CUBLASLT=1 python
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
[W619 06:14:11.627715807 Context.cpp:274] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>

and the following on a gfx90a (supported by hipblasLT) architecture:

Using setter function:

>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublaslt: 1>
>>> torch.backends.cuda.preferred_blas_library(backend="cublas")
<_BlasBackend.Cublas: 0>
>>> torch.backends.cuda.preferred_blas_library(backend="cublaslt")
[W620 18:38:29.404265518 Context.cpp:293] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
<_BlasBackend.Cublaslt: 1>

Using TORCH_BLAS_PREFER_HIPBLASLT env var:

root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_HIPBLASLT=1 python
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublaslt: 1>

(Same result for Using TORCH_BLAS_PREFER_CUBLASLT env var:)

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

pytorch-bot · 2024-06-14T22:45:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128753

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit ce34f44 with merge base a6ac644 ():

NEW FAILURES - The following jobs have failed:

pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 2, 3, linux.2xlarge) (gh)
inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_cache_hit_dynamic_shapes_cpu
pull / linux-focal-py3.12-clang10-experimental-split-build / test (default, 3, 3, linux.2xlarge) (gh)
inductor/test_torchinductor_codegen_dynamic_shapes.py::DynamicShapesCodegenCpuTests::test_aoti_eager_support_out_dynamic_shapes_cpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…kend

…alizer. This prevents a hang with the previous approach when using env var TORCH_BLAS_PREFER_HIPBLASLT=1

…hecking logic is only executed once (until setter is used to set backend to cublaslt again)

IMbackK · 2024-06-20T15:58:10Z

aten/src/ATen/Context.cpp

+at::BlasBackend Context::blasPreferredBackend() {
+#ifdef USE_ROCM
+  if (blas_preferred_backend == at::BlasBackend::Cublaslt) {
+    static const std::vector<std::string> archs = {"gfx90a", "gfx940", "gfx941", "gfx942"};


Having yet another place where an arbitrary list of llvm targets is placed seams like a bad idea, since it will have to be remembered when the supported targets of hipblaslt expand or contracts. Further this list is already wrong right now as hipblaslt has support for some gfx11 targets and the current code dose work there, at least to some degree.

At the very least this needs to be a define set via a cmake option, but you could query the architectures from the hipblaslt fatbinary which is not that hard to implement directly, but ideally the runtime would of course provide this infomation

You make a good point about the maintenance headache this introduces. I'm not sure about this being a cmake option though, since this is not exactly user-configurable information? I'm looking into whether the hipblasLT library provides us a way to query the list of supported archs.

So hipblasLT doesn't currently have an API to report supported gfx archs, but we will request that. Until then, I believe this solution is appropriate.

I think that's fair.

Id like to also mention here that this code currently dose not work at all at the moment since haveing one of the gpus in the system that are not supported by hipblaslt causes the runtime to assert here when a hipblaslt code object is loaded by ldd here: https://github.com/ROCm/clr/blob/204d35d16ef5c2c1ea1a4bb25442908a306c857a/hipamd/src/hip_code_object.cpp#L762 from https://github.com/ROCm/clr/blob/204d35d16ef5c2c1ea1a4bb25442908a306c857a/hipamd/src/hip_code_object.cpp#L752C22-L752C30 which ultimately calls ExtractFatBinaryUsingCOMGR

In the tests on ci this appears to work as you have disabled runtime assertions in clr there (which is imo not great in and of itself), but it dosent really work with disabled assertions either. When you do have a supported and an unsupported gpu in the system, depending on the gpu order ExtractFatBinaryUsingCOMGR can fail and return before it gets to the supported gpu, this causes the gpu code objects to subsequently be missing even for the supported gpu when torch tries to use them.

I presume a solution for this is in the pipe, because at the moment the way this pr attempts to select which gpus to use hipblaslt on at runtime simply dose not work with how the rocm runtime is designed, because by the time the above code is run the runtime has already entered a failed state.

@IMbackK I agree that there's an issue in the way HIP runtime handles code object loading for multiple GPUs in a heterogenous system. However, this PR actually intends to set the blas_backend to at::BlasBackend::Cublas if any of the GPUs in the system are unsupported. This means that if you have a system with a gfx90a and a gfx908 GPU, trying to set the preferred backend to at::BlasBackend::Cublaslt will end up overriding it to at::BlasBackend::Cublas. IIUC, that should not break functionality. In other words, this PR is not attempting to "select which gpus to use hipblaslt on at runtime", it is either using hipblasLT on all GPUs (if they're all supported), or on none of them.
If you do have a heterogenous system, please try this PR on it and confirm if you observe the above behaviour.

@IMbackK Please correct me if I'm wrong, but this is how I understand the concerns you are raising:

There's a HIP runtime issue which causes a functional issue on unsupported gfx archs (on hetero or homogenous systems) only when runtime assertions are enabled in clr

This PR is to prevent users from setting the wrong/unsupported cublaslt backend if any of their GPUs do not support it (regardless of the clr assert issue)

Setting the default value of the preferred linalg backend to cublas will still run into the clr assertions since PyTorch will still try to load hipblaslt library via libtorch_hip.so at the start

Can you please confirm the above matches your understanding?

yes all those points are correct.

the only slight nitpick i have is that I dont know if the problem in the HIP runtime is an issue per-say, the runtime simply dosent support loading objects containing hip code but not containing code objects for all available gpus and the clr code is pretty explicit about that attempting this is in fact an error. I guess it is more a missing feature.
If in the future you do attempt (unlike this pr) to use hipblaslt on the supported gpus in a heterogeneous system this will cause the the runtime to read uninitialized memory and ultimately crash.

Okay, in that case, I do not consider the issue you're raising as being a blocker for this PR, as this PR doesn't make things any worse for that scenario.

Do you have a link to an issue that has been filed for the assertion-enabled scenario? I think we should follow-up on that to see how we can resolve it properly. I guess #119081 is that issue in a way, since it is on Fedora, but will it get closed according to #119081 (comment) if #120551 merges?

#119081 is is issue in a way and i am currently using #120551 however #120551 can really only be considered a solution if pytorch disables hipblaslt at compile time using that pr for all official builds that are supposed to support gpus besides CDNA2/3 and RDNA3 until one of the following happens:

hipblaslt changes to not have gpu code in the main .so but to instead load all gpu code as hipmodules

hipblaslt gains support for all the usual rocm targets

the runtime gains support for loading code objects that lack support for a given gpu and gains api for clients to use to determine when this has occurred so that the clients can avoid calling into these code objects.

I agree this pr dosent make anything worse, i was mainly noting that it dose not address this issue since the decision here to use hipblaslt or not comes to late.

jithunnair-amd · 2024-06-22T21:59:24Z

aten/src/ATen/Context.cpp

+at::BlasBackend Context::blasPreferredBackend() {
+#ifdef USE_ROCM
+  if (blas_preferred_backend == at::BlasBackend::Cublaslt) {
+    static const bool hipblaslt_unsupported = []() {


Using static to ensure this variable is only defined once and const since it is assumed that the value of this variable will remain the same for every invocation since the machine configuration will be the same.

jithunnair-amd · 2024-06-22T22:00:27Z

aten/src/ATen/Context.cpp

+      }
+      return false;
+    }();
+    if (hipblaslt_unsupported) blas_preferred_backend = at::BlasBackend::Cublas;


Override the value of blas_preferred_backend, making this getter function not be const anymore

jithunnair-amd · 2024-06-22T22:09:07Z

@xw285cornell Please review this PR, as it is trying to address a fallout of your PR #127944.

jithunnair-amd · 2024-07-01T20:44:57Z

@malfet Can you please review this PR?

jithunnair-amd · 2024-07-04T05:58:05Z

@pytorchbot merge

pytorchmergebot · 2024-07-04T05:59:50Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

jithunnair-amd · 2024-07-08T17:41:52Z

@pytorchbot merge -f "unrelated CI failures"

pytorchmergebot · 2024-07-08T17:43:30Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes meta-internal errors after importing #128753 (see [D59498679](https://www.internalfb.com/diff/D59498679)) ``` fbcode/caffe2/aten/src/ATen/Context.cpp:286:34: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] for (auto index = 0; index < at::getNumGPUs(); index++) { ~~~~~ ^ ~~~~~~~~~~~~~~~~ 1 error generated. ``` Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: #130388 Approved by: https://github.com/Skylion007, https://github.com/malfet

Fixes meta-internal errors after importing pytorch#128753 (see [D59498679](https://www.internalfb.com/diff/D59498679)) ``` fbcode/caffe2/aten/src/ATen/Context.cpp:286:34: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] for (auto index = 0; index < at::getNumGPUs(); index++) { ~~~~~ ^ ~~~~~~~~~~~~~~~~ 1 error generated. ``` Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: pytorch#130388 Approved by: https://github.com/Skylion007, https://github.com/malfet

pruthvistony · 2024-08-13T21:02:11Z

@pytorchbot cherry-pick --onto release/2.4 -c critical

… hipblasLT (#128753) This PR is needed to resolve usability issues with PyTorch ROCm nightly wheels on non-gfx90a/gf94x architectures as a result of #127944. Addresses #119081 (comment) ### With this PR's changes, I get the following on a gfx908 (unsupported by hipblasLT) architecture: _Using setter function:_ ``` >>> torch.backends.cuda.preferred_blas_library(backend="cublaslt") [W617 19:58:58.286088851 Context.cpp:280] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator()) [W617 19:59:02.125161985 Context.cpp:291] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator()) <_BlasBackend.Cublas: 0> ``` _Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_ ``` root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_CUBLASLT=1 python >>> import torch >>> torch.backends.cuda.preferred_blas_library() [W619 06:14:11.627715807 Context.cpp:274] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator()) <_BlasBackend.Cublas: 0> ``` ### and the following on a gfx90a (supported by hipblasLT) architecture: _Using setter function:_ ``` >>> import torch >>> torch.backends.cuda.preferred_blas_library() <_BlasBackend.Cublaslt: 1> >>> torch.backends.cuda.preferred_blas_library(backend="cublas") <_BlasBackend.Cublas: 0> >>> torch.backends.cuda.preferred_blas_library(backend="cublaslt") [W620 18:38:29.404265518 Context.cpp:293] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator()) <_BlasBackend.Cublaslt: 1> ``` _Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_ ``` root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_HIPBLASLT=1 python >>> import torch >>> torch.backends.cuda.preferred_blas_library() <_BlasBackend.Cublaslt: 1> ``` (Same result for _Using `TORCH_BLAS_PREFER_CUBLASLT` env var:_) Pull Request resolved: #128753 Approved by: https://github.com/malfet (cherry picked from commit e16276b)

pytorchbot · 2024-08-13T21:06:56Z

Cherry picking #128753

The cherry pick PR is at #133359 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

[v2.4.1] Release Tracker #132400 (comment)

Details for Dev Infra team

Raised by workflow job

… hipblasLT (#133359) [ROCm] Check supported archs before setting preferred blas backend to hipblasLT (#128753) This PR is needed to resolve usability issues with PyTorch ROCm nightly wheels on non-gfx90a/gf94x architectures as a result of #127944. Addresses #119081 (comment) ### With this PR's changes, I get the following on a gfx908 (unsupported by hipblasLT) architecture: _Using setter function:_ ``` >>> torch.backends.cuda.preferred_blas_library(backend="cublaslt") [W617 19:58:58.286088851 Context.cpp:280] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator()) [W617 19:59:02.125161985 Context.cpp:291] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator()) <_BlasBackend.Cublas: 0> ``` _Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_ ``` root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_CUBLASLT=1 python >>> import torch >>> torch.backends.cuda.preferred_blas_library() [W619 06:14:11.627715807 Context.cpp:274] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator()) <_BlasBackend.Cublas: 0> ``` ### and the following on a gfx90a (supported by hipblasLT) architecture: _Using setter function:_ ``` >>> import torch >>> torch.backends.cuda.preferred_blas_library() <_BlasBackend.Cublaslt: 1> >>> torch.backends.cuda.preferred_blas_library(backend="cublas") <_BlasBackend.Cublas: 0> >>> torch.backends.cuda.preferred_blas_library(backend="cublaslt") [W620 18:38:29.404265518 Context.cpp:293] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator()) <_BlasBackend.Cublaslt: 1> ``` _Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_ ``` root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_HIPBLASLT=1 python >>> import torch >>> torch.backends.cuda.preferred_blas_library() <_BlasBackend.Cublaslt: 1> ``` (Same result for _Using `TORCH_BLAS_PREFER_CUBLASLT` env var:_) Pull Request resolved: #128753 Approved by: https://github.com/malfet (cherry picked from commit e16276b) Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>

jithunnair-amd · 2024-08-30T05:30:22Z

Confirmed fixed in final 2.4.1 RC:

API BEHAVIOUR

$ TORCH_BLAS_PREFER_CUBLASLT=1 python
Python 3.12.4 (main, Jun  8 2024, 18:29:57) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> for i in range(0,torch.cuda.device_count()):
...   print(torch.cuda.get_device_properties(i))
...
_CudaDeviceProperties(name='AMD Instinct MI210', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104)
_CudaDeviceProperties(name='AMD Instinct MI210', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104)
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublaslt: 1>
>>> exit()



$ TORCH_BLAS_PREFER_CUBLASLT=1 python
Python 3.12.4 (main, Jun  8 2024, 18:29:57) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> for i in range(0,torch.cuda.device_count()):
...   print(torch.cuda.get_device_properties(i))
...
_CudaDeviceProperties(name='AMD Instinct MI100', major=9, minor=0, gcnArchName='gfx908:sramecc+:xnack-', total_memory=32752MB, multi_processor_count=120)
_CudaDeviceProperties(name='AMD Instinct MI100', major=9, minor=0, gcnArchName='gfx908:sramecc+:xnack-', total_memory=32752MB, multi_processor_count=120)
_CudaDeviceProperties(name='AMD Instinct MI210', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104)
_CudaDeviceProperties(name='AMD Instinct MI210', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104)
>>> torch.backends.cuda.preferred_blas_library()
[W830 04:44:17.589682263 Context.cpp:273] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>

$ python
Python 3.12.4 (main, Jun  8 2024, 18:29:57) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublas: 0>
>>> torch.backends.cuda.preferred_blas_library(backend="cublaslt")
[W830 04:45:50.727508218 Context.cpp:297] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
[W830 04:45:54.189127512 Context.cpp:273] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>

UNIT TESTS

With PyTorch 2.4.0 wheels on MI100:

$ PYTORCH_TEST_WITH_ROCM=1 pytest test_linalg.py -k test_matmul_small_brute_force_1d_Nd_cuda_float32 --verbose
...
rocblaslt warning: No paths matched /usr/local/lib/python3.12/dist-packages/torch/lib/hipblaslt/library/*gfx908*co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly.
FAILED [49.7454s]                                               [100%]
___________________________________________ TestLinalgCUDA.test_matmul_small_brute_force_1d_Nd_cuda_float32 ____________________________________________
Traceback (most recent call last):
  File "/data/pytorch/test/test_linalg.py", line 4450, in test_matmul_small_brute_force_1d_Nd
    self.check_single_matmul(x, y)
  File "/data/pytorch/test/test_linalg.py", line 4401, in check_single_matmul
    ans = torch.matmul(x, y)
          ^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: HIPBLAS_STATUS_NOT_SUPPORTED when calling `HIPBLAS_STATUS_NOT_SUPPORTED`

With PyTorch 2.4.1 wheels on MI100:

$ PYTORCH_TEST_WITH_ROCM=1 pytest test_linalg.py -k test_matmul_small_brute_force_1d_Nd_cuda_float32 --verbose
...
[W830 05:11:24.062774001 Context.cpp:273] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
PASSED [2.2048s]                                                [100%]

IMbackK · 2024-08-30T13:32:26Z

Since mi100 is now supported by hipblaslt (ROCm/hipBLASLt@938900a) if built from git, i think it would be useful to have some way to override this check.

The same is also true of gfx11 which has hupblaslt support but is not allowed by the list in this pr.

jithunnair-amd requested a review from jeffdaily June 14, 2024 22:45

pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch labels Jun 14, 2024

pytorchbot added the open source label Jun 14, 2024

jithunnair-amd requested a review from xw285cornell June 15, 2024 12:35

jithunnair-amd added 4 commits June 19, 2024 22:46

Check supported archs before setting preferred blas backend

b6f65c0

Ensure env vars also check if cublasLT(hiplasLT) is an acceptable bac…

747555b

…kend

Override preferred backend in getter function instead of setter/initi…

949609e

…alizer. This prevents a hang with the previous approach when using env var TORCH_BLAS_PREFER_HIPBLASLT=1

Set the blas_preferred_backend if overriding in getter so that arch-c…

2a69042

…hecking logic is only executed once (until setter is used to set backend to cublaslt again)

This was referenced Jun 19, 2024

gfx906 ROCM print black images all ai torch: 2.0.1+rocm5.4.2/rocm5.5 only works with torch=1.13.0+rocm5.2 #103973

Closed

ROCm loses some supported GPUs by requiring hipblaslt #119081

Closed

IMbackK reviewed Jun 20, 2024

View reviewed changes

jithunnair-amd force-pushed the restrict_hipblaslt_archs branch from cf781a0 to 2a69042 Compare June 20, 2024 20:07

jithunnair-amd and others added 3 commits June 20, 2024 23:18

Only check for supported archs once

3c9c213

Lint

fd16ce1

Lint

ce34f44

jithunnair-amd marked this pull request as ready for review June 22, 2024 02:32

jithunnair-amd requested a review from eqy as a code owner June 22, 2024 02:32

jithunnair-amd commented Jun 22, 2024

View reviewed changes

jithunnair-amd requested a review from malfet June 22, 2024 22:06

jithunnair-amd added rocm This tag is for PRs from ROCm team rocm priority high priority ROCm PRs from performance or other aspects labels Jun 22, 2024

jichangjichang mentioned this pull request Jun 24, 2024

Add API hipblasLtIsDeviceSupported() to query if current device is supported ROCm/hipBLASLt#860

Closed

IMbackK mentioned this pull request Jun 24, 2024

Enable hipBLASLt backend for GEMM, and make it the default option for RNN's under specific conditions ROCm/MIOpen#3030

Merged

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 24, 2024

malfet approved these changes Jul 1, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 4, 2024

pytorchmergebot added the merging label Jul 4, 2024

pytorchmergebot removed the merging label Jul 4, 2024

jithunnair-amd added the release notes: rocm mandatorylabel label Jul 4, 2024

pytorchmergebot added the merging label Jul 8, 2024

pytorchmergebot closed this in e16276b Jul 8, 2024

pytorchmergebot added Merged and removed merging labels Jul 8, 2024

izaitsevfb mentioned this pull request Jul 9, 2024

Use irange to avoid -Wsign-compare errors #130388

Closed

hongxiayang mentioned this pull request Jul 11, 2024

[ROCm][AMD][Bugfix] unify CUDA_VISIBLE_DEVICES usage in vllm to get device count and fixed navi3x vllm-project/vllm#6352

Merged

jithunnair-amd added this to the 2.4.1 milestone Jul 29, 2024

pytorchbot mentioned this pull request Aug 13, 2024

[v2.4.1] Release Tracker #132400

Closed

atalman mentioned this pull request Aug 28, 2024

Release 2.4.1 validations checklist and cherry-picks #134694

Closed

40 tasks

[ROCm] Check supported archs before setting preferred blas backend to hipblasLT #128753

[ROCm] Check supported archs before setting preferred blas backend to hipblasLT #128753

Uh oh!

Conversation

jithunnair-amd commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

With this PR's changes, I get the following on a gfx908 (unsupported by hipblasLT) architecture:

and the following on a gfx90a (supported by hipblasLT) architecture:

Uh oh!

pytorch-bot bot commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128753

❌ 2 New Failures

Uh oh!

IMbackK Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd Jun 22, 2024

Choose a reason for hiding this comment

Uh oh!

IMbackK Jun 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd Jun 27, 2024

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd Jul 1, 2024

Choose a reason for hiding this comment

Uh oh!

IMbackK Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd Jul 1, 2024

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IMbackK Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd Jun 22, 2024

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd Jun 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd commented Jun 22, 2024

Uh oh!

jithunnair-amd commented Jul 1, 2024

Uh oh!

jithunnair-amd commented Jul 4, 2024

Uh oh!

pytorchmergebot commented Jul 4, 2024

Merge failed

Uh oh!

jithunnair-amd commented Jul 8, 2024

Uh oh!

pytorchmergebot commented Jul 8, 2024

Merge started

Uh oh!

pruthvistony commented Aug 13, 2024

Uh oh!

pytorchbot commented Aug 13, 2024

Cherry picking #128753

Uh oh!

jithunnair-amd commented Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API BEHAVIOUR

UNIT TESTS

jithunnair-amd commented Jun 14, 2024 •

edited

Loading

pytorch-bot bot commented Jun 14, 2024 •

edited

Loading

IMbackK Jun 20, 2024 •

edited

Loading

IMbackK Jun 22, 2024 •

edited

Loading

IMbackK Jul 1, 2024 •

edited

Loading

jithunnair-amd Jul 1, 2024 •

edited

Loading

IMbackK Jul 1, 2024 •

edited

Loading

jithunnair-amd Jun 22, 2024 •

edited

Loading

jithunnair-amd commented Aug 30, 2024 •

edited

Loading

IMbackK commented Aug 30, 2024 •

edited

Loading