Skip to content

Conversation

RSchwan
Copy link
Contributor

@RSchwan RSchwan commented Jun 3, 2025

Description

This PR updates the libmathdx library from version 0.1.2 to 0.2.1. The main reason is to add support for trsm operations, which were introduced in version 0.2.1. Additionally, the tile_cholesky_solve has been extended to allow for 2D right-hand sides.

The major change from libmathdx 0.1.2 to 0.2.1 was a change in the C API, i.e., the scalings alpha and beta in the gemm kernel are now passed as a pointer. Additionally, I changed the cuda implementations of tile_lower_solve and tile_upper_solve to use the trsm kernels from cusolverdx. In a personal project which heavenly relies on tile_lower_solve, I see an overall 2x speed-up with the trsm kernel from cusolverdx.

The tile_cholesky_solve operation has been extended to allow for 2D right-hand sides, mirroring the API of tile_lower_solve and tile_upper_solve. An appropriate test has been added.

The full test suit is passing on my machine (RTX 3080).

Before your PR is "Ready for review"

  • All commits are signed-off to indicate that your contribution adheres to the Developer Certificate of Origin requirements
  • Necessary tests have been added
  • Documentation is up-to-date
  • Auto-generated files modified by compiling Warp and building the documentation have been updated (e.g. stubs.py, functions.rst)
  • Code passes formatting and linting checks with pre-commit run -a

RSchwan added 2 commits June 3, 2025 23:51
…from libmathdx for tile_lower_solve and tile_upper_solve

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>
Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>
@shi-eric shi-eric requested a review from daedalus5 June 4, 2025 01:45
Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>
@shi-eric shi-eric added this to the 1.8.0 milestone Jun 4, 2025
@shi-eric
Copy link
Contributor

shi-eric commented Jun 4, 2025

This fix comes from @daedalus5 who has also been working on the libmathdx 0.2.1 evaluation in Warp:

in warp/build_dll.py, we need to change Line 396:

OLD:

linkopts.append(f'nvJitLink_static.lib /LIBPATH:"{args.libmathdx_path}/lib" mathdx_static.lib')

NEW:

linkopts.append(f'nvJitLink_static.lib /LIBPATH:"{args.libmathdx_path}/lib/x64" mathdx_static.lib')

RSchwan added 2 commits June 4, 2025 13:19
Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>
Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>
@RSchwan
Copy link
Contributor Author

RSchwan commented Jun 4, 2025

While going through the code again, I figured out what caused #768. Turns out that the layout information was not passed along correctly. Since the layout information is only used in the mathdx calls, I took the liberty to directly fix it in this PR (1561b44) since it's directly related. I modified the tests accordingly to test for layout propagation.

@daedalus5
Copy link
Contributor

Thanks so much for this @RSchwan . Phenomenal effort.

A few small notes:

Could you please add this decorator
@unittest.skipUnless(wp.context.runtime.core.is_mathdx_enabled(), "Warp was not built with MathDx support")

to

test_tile_math_forward_substitution()
test_tile_math_forward_substitution_multiple_rhs()
test_tile_math_back_substitution()
test_tile_math_back_substitution_multiple_rhs()

and this decorator
@unittest.skipUnless(wp.context.runtime.core.cuda_toolkit_version() >= 12600, "CUDA toolkit version is less than 12.6")

to

test_tile_math_cholesky()

in test_tile_mathdx.py? We need these for our CI pipeline.

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>
@RSchwan
Copy link
Contributor Author

RSchwan commented Jun 5, 2025

Done. Let me know if you need other changes.

daedalus5
daedalus5 previously approved these changes Jun 5, 2025
@daedalus5
Copy link
Contributor

It looks like we need this:
@unittest.skipUnless(wp.context.runtime.core.cuda_toolkit_version() >= 12600, "CUDA toolkit version is less than 12.6")

over
test_tile_math_matmul() as well.

@daedalus5
Copy link
Contributor

Apologies, these should both reference version 12060, not 12600. So
wp.context.runtime.core.cuda_toolkit_version() >= 12060

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>
@daedalus5
Copy link
Contributor

daedalus5 commented Jun 6, 2025

It looks like we need this: @unittest.skipUnless(wp.context.runtime.core.cuda_toolkit_version() >= 12060, "CUDA toolkit version is less than 12.6")

over test_tile_math_matmul() as well.

Just missing this.

Signed-off-by: Roland Schwan <roland.schwan@mikrounix.com>
@shi-eric
Copy link
Contributor

shi-eric commented Jun 9, 2025

Hey @RSchwan, could you also please clean up the commit history with git rebase/squash so it's down to a single commit? We need to merge this into an internal repo first and it's easier if the commits of this pull request are added as-is to the main branch. Rebasing these changes onto the latest main branch would also be nice.

@shi-eric
Copy link
Contributor

shi-eric commented Jun 9, 2025

I ended up rebasing the branch on my own, so this pull request did not automatically close when 93f9fef and b7ff683 were added, but you do get credit for the changes in the commit history. Thanks again for this contribution!

@shi-eric shi-eric closed this Jun 9, 2025
pull bot pushed a commit to Mu-L/warp-gpu that referenced this pull request Jun 9, 2025
Update libmathdx, extend tile_cholesky_solve to 2D rhs, and improve trsm based tile operations (NVIDIAGH-773)

See merge request omniverse/warp!1360
@momo-van momo-van added tile feature request Request for something to be added labels Jun 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for something to be added tile
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants