-
Notifications
You must be signed in to change notification settings - Fork 41
Resolve MPI issues #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve MPI issues #169
Conversation
Rebased and fixed style. |
Just an update, MPI works for all examples except DP4. All tests work except CorrGauss, and the three previous tests that were broken. First priority is figuring out the problem with DP4, then CorrGauss. |
Rebased on master, and fixed (hopefully) the one usage of old C++ loop style that triggered the tidy checker. Please fetch and reset your checkout. ( |
Under src/PDFs/physics/Amp4Body.cu, is there any reason to hand-evaluate vs using Here is the code in question:
I would prefer the I am wondering about Currently, the transform happens in device, copy back to host for MPI, then copy back to device which is converted back to host. Thoughts? |
I would like to move all evaluation into the three methods in GooPdf, and remove all hand evaluations in subclasses (possibly with the exception of ones that return complex or multiple values, TBD). There are two versions to allow someone writing CUDA code to avoid the copy. The normal user will probably want the output on the CPU, so the GPU to CPU version is provided. Python will (currently) only have access to the CPU copied version. That was my intention, anyway. (Maybe in numba 0.29 a GPU version could eventually interesting?) I'm not sure a GPU version is useful/important, though, so would be willing to drop it for now if needed. |
Eventual hope is something like this for the call structure: #155 |
8593b7c
to
eb7d3c3
Compare
Cherrypicked and fixed style. Please use |
Thanks for fixing up the style. This PR is ready go! |
This PR is fixing any issues that have made the MPI version not work.