Update libmathdx, extend tile_cholesky_solve to 2D rhs, and improve trsm based tile operations #773

RSchwan · 2025-06-03T22:44:22Z

Description

This PR updates the libmathdx library from version 0.1.2 to 0.2.1. The main reason is to add support for trsm operations, which were introduced in version 0.2.1. Additionally, the tile_cholesky_solve has been extended to allow for 2D right-hand sides.

The major change from libmathdx 0.1.2 to 0.2.1 was a change in the C API, i.e., the scalings alpha and beta in the gemm kernel are now passed as a pointer. Additionally, I changed the cuda implementations of tile_lower_solve and tile_upper_solve to use the trsm kernels from cusolverdx. In a personal project which heavenly relies on tile_lower_solve, I see an overall 2x speed-up with the trsm kernel from cusolverdx.

The tile_cholesky_solve operation has been extended to allow for 2D right-hand sides, mirroring the API of tile_lower_solve and tile_upper_solve. An appropriate test has been added.

The full test suit is passing on my machine (RTX 3080).

Before your PR is "Ready for review"

All commits are signed-off to indicate that your contribution adheres to the Developer Certificate of Origin requirements
Necessary tests have been added
Documentation is up-to-date
Auto-generated files modified by compiling Warp and building the documentation have been updated (e.g. stubs.py, functions.rst)
Code passes formatting and linting checks with pre-commit run -a

…from libmathdx for tile_lower_solve and tile_upper_solve Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

deps/libmathdx-deps.packman.xml

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

shi-eric · 2025-06-04T09:01:02Z

This fix comes from @daedalus5 who has also been working on the libmathdx 0.2.1 evaluation in Warp:

in warp/build_dll.py, we need to change Line 396:

OLD:

linkopts.append(f'nvJitLink_static.lib /LIBPATH:"{args.libmathdx_path}/lib" mathdx_static.lib')

NEW:

linkopts.append(f'nvJitLink_static.lib /LIBPATH:"{args.libmathdx_path}/lib/x64" mathdx_static.lib')

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

RSchwan · 2025-06-04T11:22:11Z

While going through the code again, I figured out what caused #768. Turns out that the layout information was not passed along correctly. Since the layout information is only used in the mathdx calls, I took the liberty to directly fix it in this PR (1561b44) since it's directly related. I modified the tests accordingly to test for layout propagation.

daedalus5 · 2025-06-05T14:39:07Z

Thanks so much for this @RSchwan . Phenomenal effort.

A few small notes:

Could you please add this decorator
@unittest.skipUnless(wp.context.runtime.core.is_mathdx_enabled(), "Warp was not built with MathDx support")

to

test_tile_math_forward_substitution()
test_tile_math_forward_substitution_multiple_rhs()
test_tile_math_back_substitution()
test_tile_math_back_substitution_multiple_rhs()

and this decorator
@unittest.skipUnless(wp.context.runtime.core.cuda_toolkit_version() >= 12600, "CUDA toolkit version is less than 12.6")

to

test_tile_math_cholesky()

in test_tile_mathdx.py? We need these for our CI pipeline.

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

RSchwan · 2025-06-05T15:49:06Z

Done. Let me know if you need other changes.

daedalus5 · 2025-06-05T17:32:59Z

It looks like we need this:
@unittest.skipUnless(wp.context.runtime.core.cuda_toolkit_version() >= 12600, "CUDA toolkit version is less than 12.6")

over
test_tile_math_matmul() as well.

daedalus5 · 2025-06-05T17:59:55Z

Apologies, these should both reference version 12060, not 12600. So
wp.context.runtime.core.cuda_toolkit_version() >= 12060

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

daedalus5 · 2025-06-06T15:54:03Z

It looks like we need this: @unittest.skipUnless(wp.context.runtime.core.cuda_toolkit_version() >= 12060, "CUDA toolkit version is less than 12.6")

over test_tile_math_matmul() as well.

Just missing this.

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

shi-eric · 2025-06-09T06:14:57Z

Hey @RSchwan, could you also please clean up the commit history with git rebase/squash so it's down to a single commit? We need to merge this into an internal repo first and it's easier if the commits of this pull request are added as-is to the main branch. Rebasing these changes onto the latest main branch would also be nice.

shi-eric · 2025-06-09T19:49:13Z

I ended up rebasing the branch on my own, so this pull request did not automatically close when 93f9fef and b7ff683 were added, but you do get credit for the changes in the commit history. Thanks again for this contribution!

Update libmathdx, extend tile_cholesky_solve to 2D rhs, and improve trsm based tile operations (NVIDIAGH-773) See merge request omniverse/warp!1360

RSchwan added 2 commits June 3, 2025 23:51

update libmathdx, extend tile_cholesky_solve to 2D rhs, and use trsm …

c622aaf

…from libmathdx for tile_lower_solve and tile_upper_solve Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

formatting

55eb840

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

shi-eric requested a review from daedalus5 June 4, 2025 01:45

shi-eric reviewed Jun 4, 2025

View reviewed changes

deps/libmathdx-deps.packman.xml Outdated Show resolved Hide resolved

fix libmathdx download name on windows

703075e

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

shi-eric added this to the 1.8.0 milestone Jun 4, 2025

RSchwan added 2 commits June 4, 2025 13:19

fix win libmathdx path

408cff6

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

propagate correct layout information, fixes NVIDIA#768

1561b44

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

add test skip decorators

6dcd374

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

daedalus5 previously approved these changes Jun 5, 2025

View reviewed changes

change restricted cuda toolkit version from 12600 to 12060

bcd6747

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

RSchwan dismissed daedalus5’s stale review via bcd6747 June 5, 2025 22:18

add skip to test_tile_math_matmul

d197b5b

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>

shi-eric closed this Jun 9, 2025

pull bot pushed a commit to Mu-L/warp-gpu that referenced this pull request Jun 9, 2025

Merge branch 'pr-github-773' into 'main'

c672c09

Update libmathdx, extend tile_cholesky_solve to 2D rhs, and improve trsm based tile operations (NVIDIAGH-773) See merge request omniverse/warp!1360

momo-van added tile feature request Request for something to be added labels Jun 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update libmathdx, extend tile_cholesky_solve to 2D rhs, and improve trsm based tile operations #773

Update libmathdx, extend tile_cholesky_solve to 2D rhs, and improve trsm based tile operations #773

Uh oh!

RSchwan commented Jun 3, 2025

Uh oh!

Uh oh!

shi-eric commented Jun 4, 2025

Uh oh!

RSchwan commented Jun 4, 2025

Uh oh!

daedalus5 commented Jun 5, 2025

Uh oh!

RSchwan commented Jun 5, 2025

Uh oh!

daedalus5 commented Jun 5, 2025

Uh oh!

daedalus5 commented Jun 5, 2025

Uh oh!

daedalus5 commented Jun 6, 2025 •

edited

Loading

Uh oh!

shi-eric commented Jun 9, 2025 •

edited

Loading

Uh oh!

shi-eric commented Jun 9, 2025

Uh oh!

Uh oh!

Update libmathdx, extend tile_cholesky_solve to 2D rhs, and improve trsm based tile operations #773

Update libmathdx, extend tile_cholesky_solve to 2D rhs, and improve trsm based tile operations #773

Uh oh!

Conversation

RSchwan commented Jun 3, 2025

Description

Before your PR is "Ready for review"

Uh oh!

Uh oh!

shi-eric commented Jun 4, 2025

Uh oh!

RSchwan commented Jun 4, 2025

Uh oh!

daedalus5 commented Jun 5, 2025

Uh oh!

RSchwan commented Jun 5, 2025

Uh oh!

daedalus5 commented Jun 5, 2025

Uh oh!

daedalus5 commented Jun 5, 2025

Uh oh!

daedalus5 commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shi-eric commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shi-eric commented Jun 9, 2025

Uh oh!

Uh oh!

daedalus5 commented Jun 6, 2025 •

edited

Loading

shi-eric commented Jun 9, 2025 •

edited

Loading