[RUNTIME] Implement dynamic loading with defineGetFunctionHandle for CUDA version compatibility #2771

Jokeren · 2023-12-07T04:49:39Z

In case cuda 11 drivers are still used on some systems, we shouldn't call TMA and block cluster related functions directly. Instead, we can dynamically lookup the handles to avoid compatibility issues.

Jokeren · 2023-12-07T04:49:55Z

pytorch/pytorch#115075

@jsh-20 @malfet

ThomasRaoux

LGTM

jsh-20 · 2023-12-08T00:52:17Z

LGTM, thanks!

…onHandle for CUDA version compatibility (triton-lang#2771)" This is needed for CUDA 11 support, which we'd like to have in the PyTorch 2.2 release. Original commit message: In case cuda 11 drivers are still used on some systems, we shouldn't call TMA and block cluster related functions directly. Instead, we can dynamically lookup the handles to avoid compatibility issues.

…onHandle for CUDA version compatibility (#2771)" (#2789) This is needed for CUDA 11 support, which we'd like to have in the PyTorch 2.2 release. Original commit message: In case cuda 11 drivers are still used on some systems, we shouldn't call TMA and block cluster related functions directly. Instead, we can dynamically lookup the handles to avoid compatibility issues. Co-authored-by: Keren Zhou <kerenzhou@openai.com>

To include a cherry-pick of triton-lang/triton#2771 that should fix cuda-11.8 runtime issues

To include a cherry-pick of triton-lang/triton#2771 that should fix cuda-11.8 runtime issues Also, tweak build wheel script to update both ROCm and vanilla Trition builds version to 2.2 (even though on trunk it should probably be 3.3 already) TODO: Remove `ROCM_TRITION_VERSION` once both trunk and ROCM version are in sync again Pull Request resolved: #115743 Approved by: https://github.com/davidberard98

To include a cherry-pick of triton-lang/triton#2771 that should fix cuda-11.8 runtime issues Also, tweak build wheel script to update both ROCm and vanilla Trition builds version to 2.2 (even though on trunk it should probably be 3.3 already) TODO: Remove `ROCM_TRITION_VERSION` once both trunk and ROCM version are in sync again Pull Request resolved: pytorch#115743 Approved by: https://github.com/davidberard98

…CUDA version compatibility (triton-lang#2771) In case cuda 11 drivers are still used on some systems, we shouldn't call TMA and block cluster related functions directly. Instead, we can dynamically lookup the handles to avoid compatibility issues.

…onHandle for CUDA version compatibility (triton-lang#2771)" (triton-lang#2789) This is needed for CUDA 11 support, which we'd like to have in the PyTorch 2.2 release. Original commit message: In case cuda 11 drivers are still used on some systems, we shouldn't call TMA and block cluster related functions directly. Instead, we can dynamically lookup the handles to avoid compatibility issues. Co-authored-by: Keren Zhou <kerenzhou@openai.com>

…CUDA version compatibility (triton-lang#2771) In case cuda 11 drivers are still used on some systems, we shouldn't call TMA and block cluster related functions directly. Instead, we can dynamically lookup the handles to avoid compatibility issues.

That is only present in CUDA-12 compatible drivers, and is missing in CUDA-11 ones Spiritual follow up after #2771

That is only present in CUDA-12 compatible drivers, and is missing in CUDA-11 ones Spiritual follow up after #2771 allows for dynamic query of the symbol and if run on an older driver, it will return an error. Also, fix `occupancyMaxActiveClusters` behavior when symbol is not found (before this change it would crash with null pointer deref, now it should return a structured exception)

That is only present in CUDA-12 compatible drivers, and is missing in CUDA-11 ones Spiritual follow up after triton-lang#2771 allows for dynamic query of the symbol and if run on an older driver, it will return an error. Also, fix `occupancyMaxActiveClusters` behavior when symbol is not found (before this change it would crash with null pointer deref, now it should return a structured exception)

Jokeren added 7 commits December 6, 2023 23:25

Update

a0c9e98

Update

e566c41

Update

caeab79

Update

c0242f2

Update

84b9781

Update

f3f1e00

Update

1d87ccd

Update

1afab3b

Jokeren marked this pull request as ready for review December 7, 2023 16:29

Jokeren requested a review from ptillet as a code owner December 7, 2023 16:29

Update

e025fe0

ThomasRaoux approved these changes Dec 7, 2023

View reviewed changes

Update

830faba

Jokeren merged commit 42ab415 into triton-lang:main Dec 7, 2023

Jokeren deleted the keren/cuda-compat branch December 7, 2023 23:52

malfet added a commit to pytorch/pytorch that referenced this pull request Dec 13, 2023

[Release/2.2] Update Trition pin

ee0eb06

To include a cherry-pick of triton-lang/triton#2771 that should fix cuda-11.8 runtime issues

malfet mentioned this pull request Dec 13, 2023

Update Trition pin pytorch/pytorch#115743

Closed

atalman mentioned this pull request May 14, 2024

Version: 2.3.1 #3912

Merged

malfet mentioned this pull request Jul 16, 2024

Undefined symbol: cuOccupancyMaxActiveClusters pytorch/pytorch#115075

Open

malfet added a commit that referenced this pull request Jul 16, 2024

[Runtime] Dynamically load cuTensorMapEncodeTiled

48e0cd4

That is only present in CUDA-12 compatible drivers, and is missing in CUDA-11 ones Spiritual follow up after #2771

malfet added a commit that referenced this pull request Jul 16, 2024

[Runtime] Dynamically load cuTensorMapEncodeTiled

739f184

That is only present in CUDA-12 compatible drivers, and is missing in CUDA-11 ones Spiritual follow up after #2771

malfet mentioned this pull request Jul 16, 2024

[Runtime] Dynamically load cuTensorMapEncodeTiled #4330

Merged

7 tasks

atalman mentioned this pull request Jul 17, 2024

[Runtime] Dynamically load cuTensorMapEncodeTiled (#4330) #4339

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RUNTIME] Implement dynamic loading with defineGetFunctionHandle for CUDA version compatibility #2771

[RUNTIME] Implement dynamic loading with defineGetFunctionHandle for CUDA version compatibility #2771

Uh oh!

Jokeren commented Dec 7, 2023

Uh oh!

Jokeren commented Dec 7, 2023

Uh oh!

ThomasRaoux left a comment

Uh oh!

jsh-20 commented Dec 8, 2023

Uh oh!

Uh oh!

[RUNTIME] Implement dynamic loading with defineGetFunctionHandle for CUDA version compatibility #2771

[RUNTIME] Implement dynamic loading with defineGetFunctionHandle for CUDA version compatibility #2771

Uh oh!

Conversation

Jokeren commented Dec 7, 2023

Uh oh!

Jokeren commented Dec 7, 2023

Uh oh!

ThomasRaoux left a comment

Choose a reason for hiding this comment

Uh oh!

jsh-20 commented Dec 8, 2023

Uh oh!

Uh oh!