-
Notifications
You must be signed in to change notification settings - Fork 25.1k
[ROCm] Check supported archs before setting preferred blas backend to hipblasLT #128753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128753
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit ce34f44 with merge base a6ac644 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…alizer. This prevents a hang with the previous approach when using env var TORCH_BLAS_PREFER_HIPBLASLT=1
…hecking logic is only executed once (until setter is used to set backend to cublaslt again)
aten/src/ATen/Context.cpp
Outdated
at::BlasBackend Context::blasPreferredBackend() { | ||
#ifdef USE_ROCM | ||
if (blas_preferred_backend == at::BlasBackend::Cublaslt) { | ||
static const std::vector<std::string> archs = {"gfx90a", "gfx940", "gfx941", "gfx942"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having yet another place where an arbitrary list of llvm targets is placed seams like a bad idea, since it will have to be remembered when the supported targets of hipblaslt expand or contracts. Further this list is already wrong right now as hipblaslt has support for some gfx11 targets and the current code dose work there, at least to some degree.
At the very least this needs to be a define set via a cmake option, but you could query the architectures from the hipblaslt fatbinary which is not that hard to implement directly, but ideally the runtime would of course provide this infomation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You make a good point about the maintenance headache this introduces. I'm not sure about this being a cmake option though, since this is not exactly user-configurable information? I'm looking into whether the hipblasLT library provides us a way to query the list of supported archs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So hipblasLT doesn't currently have an API to report supported gfx archs, but we will request that. Until then, I believe this solution is appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fair.
Id like to also mention here that this code currently dose not work at all at the moment since haveing one of the gpus in the system that are not supported by hipblaslt causes the runtime to assert here when a hipblaslt code object is loaded by ldd here: https://github.com/ROCm/clr/blob/204d35d16ef5c2c1ea1a4bb25442908a306c857a/hipamd/src/hip_code_object.cpp#L762 from https://github.com/ROCm/clr/blob/204d35d16ef5c2c1ea1a4bb25442908a306c857a/hipamd/src/hip_code_object.cpp#L752C22-L752C30 which ultimately calls ExtractFatBinaryUsingCOMGR
In the tests on ci this appears to work as you have disabled runtime assertions in clr there (which is imo not great in and of itself), but it dosent really work with disabled assertions either. When you do have a supported and an unsupported gpu in the system, depending on the gpu order ExtractFatBinaryUsingCOMGR can fail and return before it gets to the supported gpu, this causes the gpu code objects to subsequently be missing even for the supported gpu when torch tries to use them.
I presume a solution for this is in the pipe, because at the moment the way this pr attempts to select which gpus to use hipblaslt on at runtime simply dose not work with how the rocm runtime is designed, because by the time the above code is run the runtime has already entered a failed state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IMbackK I agree that there's an issue in the way HIP runtime handles code object loading for multiple GPUs in a heterogenous system. However, this PR actually intends to set the blas_backend to at::BlasBackend::Cublas
if any of the GPUs in the system are unsupported. This means that if you have a system with a gfx90a and a gfx908 GPU, trying to set the preferred backend to at::BlasBackend::Cublaslt
will end up overriding it to at::BlasBackend::Cublas
. IIUC, that should not break functionality. In other words, this PR is not attempting to "select which gpus to use hipblaslt on at runtime", it is either using hipblasLT on all GPUs (if they're all supported), or on none of them.
If you do have a heterogenous system, please try this PR on it and confirm if you observe the above behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IMbackK Please correct me if I'm wrong, but this is how I understand the concerns you are raising:
- There's a HIP runtime issue which causes a functional issue on unsupported gfx archs (on hetero or homogenous systems) only when runtime assertions are enabled in clr
- This PR is to prevent users from setting the wrong/unsupported cublaslt backend if any of their GPUs do not support it (regardless of the clr assert issue)
- Setting the default value of the preferred linalg backend to cublas will still run into the clr assertions since PyTorch will still try to load hipblaslt library via libtorch_hip.so at the start
Can you please confirm the above matches your understanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes all those points are correct.
the only slight nitpick i have is that I dont know if the problem in the HIP runtime is an issue per-say, the runtime simply dosent support loading objects containing hip code but not containing code objects for all available gpus and the clr code is pretty explicit about that attempting this is in fact an error. I guess it is more a missing feature.
If in the future you do attempt (unlike this pr) to use hipblaslt on the supported gpus in a heterogeneous system this will cause the the runtime to read uninitialized memory and ultimately crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, in that case, I do not consider the issue you're raising as being a blocker for this PR, as this PR doesn't make things any worse for that scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a link to an issue that has been filed for the assertion-enabled scenario? I think we should follow-up on that to see how we can resolve it properly. I guess #119081 is that issue in a way, since it is on Fedora, but will it get closed according to #119081 (comment) if #120551 merges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#119081 is is issue in a way and i am currently using #120551 however #120551 can really only be considered a solution if pytorch disables hipblaslt at compile time using that pr for all official builds that are supposed to support gpus besides CDNA2/3 and RDNA3 until one of the following happens:
- hipblaslt changes to not have gpu code in the main .so but to instead load all gpu code as hipmodules
- hipblaslt gains support for all the usual rocm targets
- the runtime gains support for loading code objects that lack support for a given gpu and gains api for clients to use to determine when this has occurred so that the clients can avoid calling into these code objects.
I agree this pr dosent make anything worse, i was mainly noting that it dose not address this issue since the decision here to use hipblaslt or not comes to late.
cf781a0
to
2a69042
Compare
at::BlasBackend Context::blasPreferredBackend() { | ||
#ifdef USE_ROCM | ||
if (blas_preferred_backend == at::BlasBackend::Cublaslt) { | ||
static const bool hipblaslt_unsupported = []() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using static
to ensure this variable is only defined once and const
since it is assumed that the value of this variable will remain the same for every invocation since the machine configuration will be the same.
} | ||
return false; | ||
}(); | ||
if (hipblaslt_unsupported) blas_preferred_backend = at::BlasBackend::Cublas; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Override the value of blas_preferred_backend
, making this getter function not be const
anymore
@xw285cornell Please review this PR, as it is trying to address a fallout of your PR #127944. |
@malfet Can you please review this PR? |
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -f "unrelated CI failures" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes meta-internal errors after importing #128753 (see [D59498679](https://www.internalfb.com/diff/D59498679)) ``` fbcode/caffe2/aten/src/ATen/Context.cpp:286:34: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] for (auto index = 0; index < at::getNumGPUs(); index++) { ~~~~~ ^ ~~~~~~~~~~~~~~~~ 1 error generated. ``` Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: #130388 Approved by: https://github.com/Skylion007, https://github.com/malfet
Fixes meta-internal errors after importing pytorch#128753 (see [D59498679](https://www.internalfb.com/diff/D59498679)) ``` fbcode/caffe2/aten/src/ATen/Context.cpp:286:34: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] for (auto index = 0; index < at::getNumGPUs(); index++) { ~~~~~ ^ ~~~~~~~~~~~~~~~~ 1 error generated. ``` Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: pytorch#130388 Approved by: https://github.com/Skylion007, https://github.com/malfet
@pytorchbot cherry-pick --onto release/2.4 -c critical |
… hipblasLT (#128753) This PR is needed to resolve usability issues with PyTorch ROCm nightly wheels on non-gfx90a/gf94x architectures as a result of #127944. Addresses #119081 (comment) ### With this PR's changes, I get the following on a gfx908 (unsupported by hipblasLT) architecture: _Using setter function:_ ``` >>> torch.backends.cuda.preferred_blas_library(backend="cublaslt") [W617 19:58:58.286088851 Context.cpp:280] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator()) [W617 19:59:02.125161985 Context.cpp:291] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator()) <_BlasBackend.Cublas: 0> ``` _Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_ ``` root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_CUBLASLT=1 python >>> import torch >>> torch.backends.cuda.preferred_blas_library() [W619 06:14:11.627715807 Context.cpp:274] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator()) <_BlasBackend.Cublas: 0> ``` ### and the following on a gfx90a (supported by hipblasLT) architecture: _Using setter function:_ ``` >>> import torch >>> torch.backends.cuda.preferred_blas_library() <_BlasBackend.Cublaslt: 1> >>> torch.backends.cuda.preferred_blas_library(backend="cublas") <_BlasBackend.Cublas: 0> >>> torch.backends.cuda.preferred_blas_library(backend="cublaslt") [W620 18:38:29.404265518 Context.cpp:293] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator()) <_BlasBackend.Cublaslt: 1> ``` _Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_ ``` root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_HIPBLASLT=1 python >>> import torch >>> torch.backends.cuda.preferred_blas_library() <_BlasBackend.Cublaslt: 1> ``` (Same result for _Using `TORCH_BLAS_PREFER_CUBLASLT` env var:_) Pull Request resolved: #128753 Approved by: https://github.com/malfet (cherry picked from commit e16276b)
Cherry picking #128753The cherry pick PR is at #133359 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated: Details for Dev Infra teamRaised by workflow job |
… hipblasLT (#133359) [ROCm] Check supported archs before setting preferred blas backend to hipblasLT (#128753) This PR is needed to resolve usability issues with PyTorch ROCm nightly wheels on non-gfx90a/gf94x architectures as a result of #127944. Addresses #119081 (comment) ### With this PR's changes, I get the following on a gfx908 (unsupported by hipblasLT) architecture: _Using setter function:_ ``` >>> torch.backends.cuda.preferred_blas_library(backend="cublaslt") [W617 19:58:58.286088851 Context.cpp:280] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator()) [W617 19:59:02.125161985 Context.cpp:291] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator()) <_BlasBackend.Cublas: 0> ``` _Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_ ``` root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_CUBLASLT=1 python >>> import torch >>> torch.backends.cuda.preferred_blas_library() [W619 06:14:11.627715807 Context.cpp:274] Warning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas (function operator()) <_BlasBackend.Cublas: 0> ``` ### and the following on a gfx90a (supported by hipblasLT) architecture: _Using setter function:_ ``` >>> import torch >>> torch.backends.cuda.preferred_blas_library() <_BlasBackend.Cublaslt: 1> >>> torch.backends.cuda.preferred_blas_library(backend="cublas") <_BlasBackend.Cublas: 0> >>> torch.backends.cuda.preferred_blas_library(backend="cublaslt") [W620 18:38:29.404265518 Context.cpp:293] Warning: torch.backends.cuda.preferred_blas_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator()) <_BlasBackend.Cublaslt: 1> ``` _Using `TORCH_BLAS_PREFER_HIPBLASLT` env var:_ ``` root@9d47bf40d4d4:/tmp/pytorch# TORCH_BLAS_PREFER_HIPBLASLT=1 python >>> import torch >>> torch.backends.cuda.preferred_blas_library() <_BlasBackend.Cublaslt: 1> ``` (Same result for _Using `TORCH_BLAS_PREFER_CUBLASLT` env var:_) Pull Request resolved: #128753 Approved by: https://github.com/malfet (cherry picked from commit e16276b) Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Confirmed fixed in final 2.4.1 RC: API BEHAVIOUR
UNIT TESTSWith PyTorch 2.4.0 wheels on MI100:
With PyTorch 2.4.1 wheels on MI100:
|
Since mi100 is now supported by hipblaslt (ROCm/hipBLASLt@938900a) if built from git, i think it would be useful to have some way to override this check. The same is also true of gfx11 which has hupblaslt support but is not allowed by the list in this pr. |
This PR is needed to resolve usability issues with PyTorch ROCm nightly wheels on non-gfx90a/gf94x architectures as a result of #127944.
Addresses #119081 (comment)
With this PR's changes, I get the following on a gfx908 (unsupported by hipblasLT) architecture:
Using setter function:
Using
TORCH_BLAS_PREFER_HIPBLASLT
env var:and the following on a gfx90a (supported by hipblasLT) architecture:
Using setter function:
Using
TORCH_BLAS_PREFER_HIPBLASLT
env var:(Same result for Using
TORCH_BLAS_PREFER_CUBLASLT
env var:)cc @jeffdaily @sunway513 @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang