Skip to content

Conversation

jithunnair-amd
Copy link
Collaborator

@jithunnair-amd jithunnair-amd commented Jun 14, 2024

This PR is needed to resolve usability issues with PyTorch ROCm nightly wheels on non-gfx90a/gf94x architectures as a result of #127944.

Addresses #119081 (comment)

With this PR's changes, I get the following on a gfx908 (unsupported by hipblasLT) architecture:

Using setter function:

>>> torch.backends.cuda.preferred_blas_library(backend="cublaslt")
[W617 19:58:58.286088851 Context.cpp:280] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
[W617 19:59:02.125161985 Context.cpp:291] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>

Using TORCH_BLAS_PREFER_HIPBLASLT env var:

root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_CUBLASLT=1 python
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
[W619 06:14:11.627715807 Context.cpp:274] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>

and the following on a gfx90a (supported by hipblasLT) architecture:

Using setter function:

>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublaslt: 1>
>>> torch.backends.cuda.preferred_blas_library(backend="cublas")
<_BlasBackend.Cublas: 0>
>>> torch.backends.cuda.preferred_blas_library(backend="cublaslt")
[W620 18:38:29.404265518 Context.cpp:293] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
<_BlasBackend.Cublaslt: 1>

Using TORCH_BLAS_PREFER_HIPBLASLT env var:

root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_HIPBLASLT=1 python
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublaslt: 1>

(Same result for Using TORCH_BLAS_PREFER_CUBLASLT env var:)

cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

@jithunnair-amd jithunnair-amd requested a review from jeffdaily June 14, 2024 22:45
Copy link

pytorch-bot bot commented Jun 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128753

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit ce34f44 with merge base a6ac644 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/rocm Trigger "default" config CI on ROCm module: rocm AMD GPU support for Pytorch labels Jun 14, 2024
…alizer. This prevents a hang with the previous approach when using env var TORCH_BLAS_PREFER_HIPBLASLT=1
…hecking logic is only executed once (until setter is used to set backend to cublaslt again)
at::BlasBackend Context::blasPreferredBackend() {
#ifdef USE_ROCM
if (blas_preferred_backend == at::BlasBackend::Cublaslt) {
static const std::vector<std::string> archs = {"gfx90a", "gfx940", "gfx941", "gfx942"};
Copy link
Contributor

@IMbackK IMbackK Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having yet another place where an arbitrary list of llvm targets is placed seams like a bad idea, since it will have to be remembered when the supported targets of hipblaslt expand or contracts. Further this list is already wrong right now as hipblaslt has support for some gfx11 targets and the current code dose work there, at least to some degree.

At the very least this needs to be a define set via a cmake option, but you could query the architectures from the hipblaslt fatbinary which is not that hard to implement directly, but ideally the runtime would of course provide this infomation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You make a good point about the maintenance headache this introduces. I'm not sure about this being a cmake option though, since this is not exactly user-configurable information? I'm looking into whether the hipblasLT library provides us a way to query the list of supported archs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So hipblasLT doesn't currently have an API to report supported gfx archs, but we will request that. Until then, I believe this solution is appropriate.

Copy link
Contributor

@IMbackK IMbackK Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fair.

Id like to also mention here that this code currently dose not work at all at the moment since haveing one of the gpus in the system that are not supported by hipblaslt causes the runtime to assert here when a hipblaslt code object is loaded by ldd here: https://github.com/ROCm/clr/blob/204d35d16ef5c2c1ea1a4bb25442908a306c857a/hipamd/src/hip_code_object.cpp#L762 from https://github.com/ROCm/clr/blob/204d35d16ef5c2c1ea1a4bb25442908a306c857a/hipamd/src/hip_code_object.cpp#L752C22-L752C30 which ultimately calls ExtractFatBinaryUsingCOMGR

In the tests on ci this appears to work as you have disabled runtime assertions in clr there (which is imo not great in and of itself), but it dosent really work with disabled assertions either. When you do have a supported and an unsupported gpu in the system, depending on the gpu order ExtractFatBinaryUsingCOMGR can fail and return before it gets to the supported gpu, this causes the gpu code objects to subsequently be missing even for the supported gpu when torch tries to use them.

I presume a solution for this is in the pipe, because at the moment the way this pr attempts to select which gpus to use hipblaslt on at runtime simply dose not work with how the rocm runtime is designed, because by the time the above code is run the runtime has already entered a failed state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IMbackK I agree that there's an issue in the way HIP runtime handles code object loading for multiple GPUs in a heterogenous system. However, this PR actually intends to set the blas_backend to at::BlasBackend::Cublas if any of the GPUs in the system are unsupported. This means that if you have a system with a gfx90a and a gfx908 GPU, trying to set the preferred backend to at::BlasBackend::Cublaslt will end up overriding it to at::BlasBackend::Cublas. IIUC, that should not break functionality. In other words, this PR is not attempting to "select which gpus to use hipblaslt on at runtime", it is either using hipblasLT on all GPUs (if they're all supported), or on none of them.
If you do have a heterogenous system, please try this PR on it and confirm if you observe the above behaviour.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IMbackK Please correct me if I'm wrong, but this is how I understand the concerns you are raising:

  • There's a HIP runtime issue which causes a functional issue on unsupported gfx archs (on hetero or homogenous systems) only when runtime assertions are enabled in clr
  • This PR is to prevent users from setting the wrong/unsupported cublaslt backend if any of their GPUs do not support it (regardless of the clr assert issue)
  • Setting the default value of the preferred linalg backend to cublas will still run into the clr assertions since PyTorch will still try to load hipblaslt library via libtorch_hip.so at the start

Can you please confirm the above matches your understanding?

Copy link
Contributor

@IMbackK IMbackK Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes all those points are correct.

the only slight nitpick i have is that I dont know if the problem in the HIP runtime is an issue per-say, the runtime simply dosent support loading objects containing hip code but not containing code objects for all available gpus and the clr code is pretty explicit about that attempting this is in fact an error. I guess it is more a missing feature.
If in the future you do attempt (unlike this pr) to use hipblaslt on the supported gpus in a heterogeneous system this will cause the the runtime to read uninitialized memory and ultimately crash.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in that case, I do not consider the issue you're raising as being a blocker for this PR, as this PR doesn't make things any worse for that scenario.

Copy link
Collaborator Author

@jithunnair-amd jithunnair-amd Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a link to an issue that has been filed for the assertion-enabled scenario? I think we should follow-up on that to see how we can resolve it properly. I guess #119081 is that issue in a way, since it is on Fedora, but will it get closed according to #119081 (comment) if #120551 merges?

Copy link
Contributor

@IMbackK IMbackK Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#119081 is is issue in a way and i am currently using #120551 however #120551 can really only be considered a solution if pytorch disables hipblaslt at compile time using that pr for all official builds that are supposed to support gpus besides CDNA2/3 and RDNA3 until one of the following happens:

  1. hipblaslt changes to not have gpu code in the main .so but to instead load all gpu code as hipmodules
  2. hipblaslt gains support for all the usual rocm targets
  3. the runtime gains support for loading code objects that lack support for a given gpu and gains api for clients to use to determine when this has occurred so that the clients can avoid calling into these code objects.

I agree this pr dosent make anything worse, i was mainly noting that it dose not address this issue since the decision here to use hipblaslt or not comes to late.

@jithunnair-amd jithunnair-amd force-pushed the restrict_hipblaslt_archs branch from cf781a0 to 2a69042 Compare June 20, 2024 20:07
@jithunnair-amd jithunnair-amd marked this pull request as ready for review June 22, 2024 02:32
@jithunnair-amd jithunnair-amd requested a review from eqy as a code owner June 22, 2024 02:32
at::BlasBackend Context::blasPreferredBackend() {
#ifdef USE_ROCM
if (blas_preferred_backend == at::BlasBackend::Cublaslt) {
static const bool hipblaslt_unsupported = []() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using static to ensure this variable is only defined once and const since it is assumed that the value of this variable will remain the same for every invocation since the machine configuration will be the same.

}
return false;
}();
if (hipblaslt_unsupported) blas_preferred_backend = at::BlasBackend::Cublas;
Copy link
Collaborator Author

@jithunnair-amd jithunnair-amd Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Override the value of blas_preferred_backend, making this getter function not be const anymore

@jithunnair-amd jithunnair-amd requested a review from malfet June 22, 2024 22:06
@jithunnair-amd jithunnair-amd added rocm This tag is for PRs from ROCm team rocm priority high priority ROCm PRs from performance or other aspects labels Jun 22, 2024
@jithunnair-amd
Copy link
Collaborator Author

@xw285cornell Please review this PR, as it is trying to address a fallout of your PR #127944.

@jithunnair-amd
Copy link
Collaborator Author

@malfet Can you please review this PR?

@jithunnair-amd
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 4, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@jithunnair-amd
Copy link
Collaborator Author

@pytorchbot merge -f "unrelated CI failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 10, 2024
Fixes meta-internal errors after importing #128753

(see [D59498679](https://www.internalfb.com/diff/D59498679))
```
fbcode/caffe2/aten/src/ATen/Context.cpp:286:34: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
      for (auto index = 0; index < at::getNumGPUs(); index++) {
                           ~~~~~ ^ ~~~~~~~~~~~~~~~~
1 error generated.
```
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: #130388
Approved by: https://github.com/Skylion007, https://github.com/malfet
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
Fixes meta-internal errors after importing pytorch#128753

(see [D59498679](https://www.internalfb.com/diff/D59498679))
```
fbcode/caffe2/aten/src/ATen/Context.cpp:286:34: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare]
      for (auto index = 0; index < at::getNumGPUs(); index++) {
                           ~~~~~ ^ ~~~~~~~~~~~~~~~~
1 error generated.
```
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: pytorch#130388
Approved by: https://github.com/Skylion007, https://github.com/malfet
@jithunnair-amd jithunnair-amd added this to the 2.4.1 milestone Jul 29, 2024
@pruthvistony
Copy link
Collaborator

@pytorchbot cherry-pick --onto release/2.4 -c critical

pytorchbot pushed a commit that referenced this pull request Aug 13, 2024
… hipblasLT (#128753)

This PR is needed to resolve usability issues with PyTorch ROCm nightly wheels on non-gfx90a/gf94x architectures as a result of #127944.

Addresses #119081 (comment)

### With this PR's changes, I get the following on a gfx908 (unsupported by hipblasLT) architecture:
_Using setter function:_
```
>>> torch.backends.cuda.preferred_blas_library(backend="cublaslt")
[W617 19:58:58.286088851 Context.cpp:280] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
[W617 19:59:02.125161985 Context.cpp:291] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>
```

_Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_
```
root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_CUBLASLT=1 python
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
[W619 06:14:11.627715807 Context.cpp:274] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>
```

### and the following on a gfx90a (supported by hipblasLT) architecture:
_Using setter function:_
```
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublaslt: 1>
>>> torch.backends.cuda.preferred_blas_library(backend="cublas")
<_BlasBackend.Cublas: 0>
>>> torch.backends.cuda.preferred_blas_library(backend="cublaslt")
[W620 18:38:29.404265518 Context.cpp:293] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
<_BlasBackend.Cublaslt: 1>
```

_Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_
```
root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_HIPBLASLT=1 python
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublaslt: 1>
```
(Same result for _Using `TORCH_BLAS_PREFER_CUBLASLT` env var:_)

Pull Request resolved: #128753
Approved by: https://github.com/malfet

(cherry picked from commit e16276b)
@pytorchbot
Copy link
Collaborator

Cherry picking #128753

The cherry pick PR is at #133359 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

atalman pushed a commit that referenced this pull request Aug 15, 2024
… hipblasLT (#133359)

[ROCm] Check supported archs before setting preferred blas backend to hipblasLT (#128753)

This PR is needed to resolve usability issues with PyTorch ROCm nightly wheels on non-gfx90a/gf94x architectures as a result of #127944.

Addresses #119081 (comment)

### With this PR's changes, I get the following on a gfx908 (unsupported by hipblasLT) architecture:
_Using setter function:_
```
>>> torch.backends.cuda.preferred_blas_library(backend="cublaslt")
[W617 19:58:58.286088851 Context.cpp:280] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
[W617 19:59:02.125161985 Context.cpp:291] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>
```

_Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_
```
root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_CUBLASLT=1 python
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
[W619 06:14:11.627715807 Context.cpp:274] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>
```

### and the following on a gfx90a (supported by hipblasLT) architecture:
_Using setter function:_
```
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublaslt: 1>
>>> torch.backends.cuda.preferred_blas_library(backend="cublas")
<_BlasBackend.Cublas: 0>
>>> torch.backends.cuda.preferred_blas_library(backend="cublaslt")
[W620 18:38:29.404265518 Context.cpp:293] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
<_BlasBackend.Cublaslt: 1>
```

_Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_
```
root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_HIPBLASLT=1 python
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublaslt: 1>
```
(Same result for _Using `TORCH_BLAS_PREFER_CUBLASLT` env var:_)

Pull Request resolved: #128753
Approved by: https://github.com/malfet

(cherry picked from commit e16276b)

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
@jithunnair-amd
Copy link
Collaborator Author

jithunnair-amd commented Aug 30, 2024

Confirmed fixed in final 2.4.1 RC:

API BEHAVIOUR

$ TORCH_BLAS_PREFER_CUBLASLT=1 python
Python 3.12.4 (main, Jun  8 2024, 18:29:57) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> for i in range(0,torch.cuda.device_count()):
...   print(torch.cuda.get_device_properties(i))
...
_CudaDeviceProperties(name='AMD Instinct MI210', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104)
_CudaDeviceProperties(name='AMD Instinct MI210', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104)
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublaslt: 1>
>>> exit()



$ TORCH_BLAS_PREFER_CUBLASLT=1 python
Python 3.12.4 (main, Jun  8 2024, 18:29:57) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> for i in range(0,torch.cuda.device_count()):
...   print(torch.cuda.get_device_properties(i))
...
_CudaDeviceProperties(name='AMD Instinct MI100', major=9, minor=0, gcnArchName='gfx908:sramecc+:xnack-', total_memory=32752MB, multi_processor_count=120)
_CudaDeviceProperties(name='AMD Instinct MI100', major=9, minor=0, gcnArchName='gfx908:sramecc+:xnack-', total_memory=32752MB, multi_processor_count=120)
_CudaDeviceProperties(name='AMD Instinct MI210', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104)
_CudaDeviceProperties(name='AMD Instinct MI210', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104)
>>> torch.backends.cuda.preferred_blas_library()
[W830 04:44:17.589682263 Context.cpp:273] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>

$ python
Python 3.12.4 (main, Jun  8 2024, 18:29:57) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.backends.cuda.preferred_blas_library()
<_BlasBackend.Cublas: 0>
>>> torch.backends.cuda.preferred_blas_library(backend="cublaslt")
[W830 04:45:50.727508218 Context.cpp:297] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
[W830 04:45:54.189127512 Context.cpp:273] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
<_BlasBackend.Cublas: 0>

UNIT TESTS

With PyTorch 2.4.0 wheels on MI100:

$ PYTORCH_TEST_WITH_ROCM=1 pytest test_linalg.py -k test_matmul_small_brute_force_1d_Nd_cuda_float32 --verbose
...
rocblaslt warning: No paths matched /usr/local/lib/python3.12/dist-packages/torch/lib/hipblaslt/library/*gfx908*co. Make sure that HIPBLASLT_TENSILE_LIBPATH is set correctly.
FAILED [49.7454s]                                               [100%]
___________________________________________ TestLinalgCUDA.test_matmul_small_brute_force_1d_Nd_cuda_float32 ____________________________________________
Traceback (most recent call last):
  File "/data/pytorch/test/test_linalg.py", line 4450, in test_matmul_small_brute_force_1d_Nd
    self.check_single_matmul(x, y)
  File "/data/pytorch/test/test_linalg.py", line 4401, in check_single_matmul
    ans = torch.matmul(x, y)
          ^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: HIPBLAS_STATUS_NOT_SUPPORTED when calling `HIPBLAS_STATUS_NOT_SUPPORTED`

With PyTorch 2.4.1 wheels on MI100:

$ PYTORCH_TEST_WITH_ROCM=1 pytest test_linalg.py -k test_matmul_small_brute_force_1d_Nd_cuda_float32 --verbose
...
[W830 05:11:24.062774001 Context.cpp:273] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator())
PASSED [2.2048s]                                                [100%]

@IMbackK
Copy link
Contributor

IMbackK commented Aug 30, 2024

Since mi100 is now supported by hipblaslt (ROCm/hipBLASLt@938900a) if built from git, i think it would be useful to have some way to override this check.

The same is also true of gfx11 which has hupblaslt support but is not allowed by the list in this pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch open source release notes: rocm mandatorylabel rocm priority high priority ROCm PRs from performance or other aspects rocm This tag is for PRs from ROCm team triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants