Skip to content

Conversation

galapaegos
Copy link
Contributor

This PR is fixing any issues that have made the MPI version not work.

@henryiii
Copy link
Member

Rebased and fixed style.

@galapaegos
Copy link
Contributor Author

Just an update, MPI works for all examples except DP4. All tests work except CorrGauss, and the three previous tests that were broken.

First priority is figuring out the problem with DP4, then CorrGauss.

@henryiii
Copy link
Member

Rebased on master, and fixed (hopefully) the one usage of old C++ loop style that triggered the tidy checker. Please fetch and reset your checkout. (git fetch && git reset origin/fix-mpi)

@galapaegos
Copy link
Contributor Author

Under src/PDFs/physics/Amp4Body.cu, is there any reason to hand-evaluate vs using evaluate_with_metric?

Here is the code in question:

    generation_no_norm = true; // we need no normalization for generation, but we do need to make sure that norm = 1;
    SigGenSetIndices();
    copyParams();
    normalize();
    setForceIntegrals();
    host_normalizations.sync(d_normalizations);

    thrust::device_vector<fptype> results(numEvents);
    thrust::constant_iterator<int> eventSize(6);
    thrust::constant_iterator<fptype *> arrayAddress(dev_event_array);
    thrust::counting_iterator<int> eventIndex(0);

    // MetricTaker evalor(this, getMetricPointer("ptr_to_Prob"));
    // we need to call evaluate_with_metric(); here.
    auto fc = fitControl;
    setFitControl(std::make_shared<ProbFit>());
    thrust::transform(thrust::make_zip_iterator(thrust::make_tuple(eventIndex, arrayAddress, eventSize)),
                      thrust::make_zip_iterator(thrust::make_tuple(eventIndex + numEvents, arrayAddress, eventSize)),
                      results.begin(),
                      *logger);
    cudaDeviceSynchronize();
    gooFree(dev_event_array);

I would prefer the evaluate_with_metric version if possible: I think GenerateSig will need a setData called prior.

I am wondering about evaluate_with_metric. There are two versions of this function, one that allows you to pass a device_vector, and one that returns a host_vector. Is there a need for the difference, should we just pass a host_vector always and let the user perform any appropriate conversion? I'm thinking from python, what would a user do with a device_vector from goofit, evaluate the buffer with a second metric?

Currently, the transform happens in device, copy back to host for MPI, then copy back to device which is converted back to host.

Thoughts?

@henryiii
Copy link
Member

I would like to move all evaluation into the three methods in GooPdf, and remove all hand evaluations in subclasses (possibly with the exception of ones that return complex or multiple values, TBD).

There are two versions to allow someone writing CUDA code to avoid the copy. The normal user will probably want the output on the CPU, so the GPU to CPU version is provided. Python will (currently) only have access to the CPU copied version. That was my intention, anyway. (Maybe in numba 0.29 a GPU version could eventually interesting?)

I'm not sure a GPU version is useful/important, though, so would be willing to drop it for now if needed.

@henryiii
Copy link
Member

Eventual hope is something like this for the call structure: #155

@henryiii henryiii force-pushed the fix-mpi branch 2 times, most recently from 8593b7c to eb7d3c3 Compare July 2, 2018 14:57
@henryiii
Copy link
Member

henryiii commented Jul 2, 2018

Cherrypicked and fixed style. Please use git fetch && git reset origin/fix-mpi. If the only remaining changes seen by git diff are whitespace, just do git reset --hard to clean them out.

@galapaegos
Copy link
Contributor Author

Thanks for fixing up the style. This PR is ready go!

@henryiii henryiii merged commit af872e5 into master Jul 2, 2018
@henryiii henryiii deleted the fix-mpi branch July 2, 2018 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants