Skip to content

Conversation

Jokeren
Copy link
Contributor

@Jokeren Jokeren commented Dec 7, 2023

In case cuda 11 drivers are still used on some systems, we shouldn't call TMA and block cluster related functions directly. Instead, we can dynamically lookup the handles to avoid compatibility issues.

@Jokeren
Copy link
Contributor Author

Jokeren commented Dec 7, 2023

@Jokeren Jokeren marked this pull request as ready for review December 7, 2023 16:29
@Jokeren Jokeren requested a review from ptillet as a code owner December 7, 2023 16:29
Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Jokeren Jokeren merged commit 42ab415 into triton-lang:main Dec 7, 2023
@Jokeren Jokeren deleted the keren/cuda-compat branch December 7, 2023 23:52
@jsh-20
Copy link
Contributor

jsh-20 commented Dec 8, 2023

LGTM, thanks!

davidberard98 pushed a commit to davidberard98/triton that referenced this pull request Dec 12, 2023
…onHandle for CUDA version compatibility (triton-lang#2771)"

This is needed for CUDA 11 support, which we'd like to have in the PyTorch 2.2 release.

Original commit message:

In case cuda 11 drivers are still used on some systems, we shouldn't
call TMA and block cluster related functions directly. Instead, we can
dynamically lookup the handles to avoid compatibility issues.
malfet pushed a commit that referenced this pull request Dec 13, 2023
…onHandle for CUDA version compatibility (#2771)" (#2789)

This is needed for CUDA 11 support, which we'd like to have in the
PyTorch 2.2 release.

Original commit message:

In case cuda 11 drivers are still used on some systems, we shouldn't
call TMA and block cluster related functions directly. Instead, we can
dynamically lookup the handles to avoid compatibility issues.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
malfet added a commit to pytorch/pytorch that referenced this pull request Dec 13, 2023
To include a cherry-pick of triton-lang/triton#2771 that should fix  cuda-11.8 runtime issues
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Dec 14, 2023
To include a cherry-pick of triton-lang/triton#2771 that should fix  cuda-11.8 runtime issues

Also, tweak build wheel script to update both ROCm and vanilla Trition builds version to 2.2 (even though on trunk it should probably be 3.3 already)

TODO: Remove `ROCM_TRITION_VERSION` once both trunk and ROCM version are in sync again

Pull Request resolved: #115743
Approved by: https://github.com/davidberard98
guilhermeleobas pushed a commit to guilhermeleobas/pytorch that referenced this pull request Dec 18, 2023
To include a cherry-pick of triton-lang/triton#2771 that should fix  cuda-11.8 runtime issues

Also, tweak build wheel script to update both ROCm and vanilla Trition builds version to 2.2 (even though on trunk it should probably be 3.3 already)

TODO: Remove `ROCM_TRITION_VERSION` once both trunk and ROCM version are in sync again

Pull Request resolved: pytorch#115743
Approved by: https://github.com/davidberard98
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
To include a cherry-pick of triton-lang/triton#2771 that should fix  cuda-11.8 runtime issues

Also, tweak build wheel script to update both ROCm and vanilla Trition builds version to 2.2 (even though on trunk it should probably be 3.3 already)

TODO: Remove `ROCM_TRITION_VERSION` once both trunk and ROCM version are in sync again

Pull Request resolved: pytorch#115743
Approved by: https://github.com/davidberard98
feihugis pushed a commit to feihugis/triton that referenced this pull request Feb 13, 2024
…CUDA version compatibility (triton-lang#2771)

In case cuda 11 drivers are still used on some systems, we shouldn't
call TMA and block cluster related functions directly. Instead, we can
dynamically lookup the handles to avoid compatibility issues.
pingzhuu pushed a commit to siliconflow/triton that referenced this pull request Apr 2, 2024
…onHandle for CUDA version compatibility (triton-lang#2771)" (triton-lang#2789)

This is needed for CUDA 11 support, which we'd like to have in the
PyTorch 2.2 release.

Original commit message:

In case cuda 11 drivers are still used on some systems, we shouldn't
call TMA and block cluster related functions directly. Instead, we can
dynamically lookup the handles to avoid compatibility issues.

Co-authored-by: Keren Zhou <kerenzhou@openai.com>
binarman pushed a commit to binarman/triton that referenced this pull request Apr 2, 2024
…CUDA version compatibility (triton-lang#2771)

In case cuda 11 drivers are still used on some systems, we shouldn't
call TMA and block cluster related functions directly. Instead, we can
dynamically lookup the handles to avoid compatibility issues.
@atalman atalman mentioned this pull request May 14, 2024
malfet added a commit that referenced this pull request Jul 16, 2024
That is only present in CUDA-12 compatible drivers, and is missing in CUDA-11 ones

Spiritual follow up after #2771
malfet added a commit that referenced this pull request Jul 16, 2024
That is only present in CUDA-12 compatible drivers, and is missing in CUDA-11 ones

Spiritual follow up after #2771
Jokeren pushed a commit that referenced this pull request Jul 16, 2024
That is only present in CUDA-12 compatible drivers, and is missing in
CUDA-11 ones

Spiritual follow up after
#2771 allows for dynamic query
of the symbol and if run on an older driver, it will return an error.
Also, fix `occupancyMaxActiveClusters` behavior when symbol is not found
(before this change it would crash with null pointer deref, now it
should return a structured exception)
atalman pushed a commit to atalman/triton that referenced this pull request Jul 17, 2024
That is only present in CUDA-12 compatible drivers, and is missing in
CUDA-11 ones

Spiritual follow up after
triton-lang#2771 allows for dynamic query
of the symbol and if run on an older driver, it will return an error.
Also, fix `occupancyMaxActiveClusters` behavior when symbol is not found
(before this change it would crash with null pointer deref, now it
should return a structured exception)
bertmaher pushed a commit to bertmaher/triton that referenced this pull request Dec 10, 2024
That is only present in CUDA-12 compatible drivers, and is missing in
CUDA-11 ones

Spiritual follow up after
triton-lang#2771 allows for dynamic query
of the symbol and if run on an older driver, it will return an error.
Also, fix `occupancyMaxActiveClusters` behavior when symbol is not found
(before this change it would crash with null pointer deref, now it
should return a structured exception)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants