Skip to content

Conversation

AnirudhDagar
Copy link
Member

@AnirudhDagar AnirudhDagar commented Jul 6, 2021

Reference issue

See gh-14353

What does this implement/fix?

This adds support for uarray backend implementation to scipy.ndimage.

A simple example tested after adding dummy CuPy backend on this branch, verifies the backend.

x_cu, x_nu = cp.arange(10), np.arange(10)

y_scipy = ndimage.correlate1d(x_nu, weights=np.array([1, 2.5]))

with scipy.ndimage.set_backend(_cupy_backend):
    y_cupy = ndimage.correlate1d(x_cu, cp.array([1, 2.5]))

print((y_cupy, type(y_cupy)), (y_scipy, type(y_scipy)) , sep='\n')

>>> (array([ 0,  2,  6,  9, 13, 16, 20, 23, 27, 30]), <class 'cupy._core.core.ndarray'>)
(array([ 0,  2,  6,  9, 13, 16, 20, 23, 27, 30]), <class 'numpy.ndarray'>)

Question

EDIT 3: This has been resolved, and we now run the whole ndimage test suite based on uarray.
Currently, I've not added any additional tests. We're now running the existing test suite via the multimethods.
Edit 2: We now have separate tests for testing the backend based on the approach below.

We could do something similar to test_backend.py and mock_backend.py just to make sure that a third party backend can be properly registered and used via a context manager, but adding every function under ndimage in thoses tests is not something we should be doing (it is redundant to the current test suite since we set the scipy global backend anyway and will probably be a repetition).

Edit 1: I realized it doesn't really matter since the only thing that is being tested is whether dispatching works through the mock backend or not.

Let me know if it is necessary to include this type of test for ndimage as well.

@rgommers just to confirm, do you want me to address the private/public namespace concerns of ndimage from cupy/cupy#5047 in this PR itself? Or should I do that subsequently, once this PR lands to keep them separated?

cc @peterbell10 @grlee77

Additional Information

I've also removed the use of unnecessary private method scipy.ndimage.filters.rank_filter in favour of scipy.ndimage.rank_filter from the test suite.

@rgommers
Copy link
Member

rgommers commented Jul 6, 2021

@rgommers just to confirm, do you want me to address the private/public namespace concerns of ndimage from cupy/cupy#5047 in this PR itself? Or should I do that subsequently, once this PR lands to keep them separated?

Separate is better, since it's an orthogonal change. Let me open an issue about that with some motivation and history, so we have a single place that we can reference when we start fixing this in various submodules.

@rgommers rgommers added enhancement A new feature or improvement scipy.ndimage labels Jul 6, 2021
@AnirudhDagar
Copy link
Member Author

AnirudhDagar commented Jul 6, 2021

Yeah, that sounds good! Will pick it up after you raise the issue. Thanks:)

@rgommers
Copy link
Member

rgommers commented Jul 6, 2021

Currently, I've not added any additional tests. We're now running the existing test suite via the multimethods.

We could do something similar to test_backend.py and mock_backend.py just to make sure that a third party backend can be properly registered and used via a context manager, but adding every function under ndimage in thoses tests is not something we should be doing

A separate test to ensure a third-party backend works but not running duplicate tests sounds very reasonable to me. Let's wait for @peterbell10 or @grlee77 to comment though, they know more about how testing in scipy.fft works.

@grlee77
Copy link
Contributor

grlee77 commented Jul 6, 2021

A separate test to ensure a third-party backend works but not running duplicate tests sounds very reasonable to me. Let's wait for @peterbell10 or @grlee77 to comment though,

We could copy the mock backend approach from the fft tests. I don't know that it is necesarry to mock every function in the ndimage namespace, though.

@peterbell10
Copy link
Member

I don't know that it is necesarry to mock every function in the ndimage namespace, though.

I think it's valuable to test that each function really does do dispatching. If you're just worried about lines of code, then you could always define a backend that just records the arguments to __ua_function__ so you wouldn't need any code to mock individual functions. Something like:

class MockBackend:
    __ua_domain__ = "numpy.scipy"

    def __init__(self, return_value=None):
        self.num_calls = 0
        self.last_args = None
        self.return_value = return_value

    def __ua_function__(self, method, args, kwargs):
        self.num_calls += 1
        self.last_args = (method, args, kwargs)
        return self.return_value

@AnirudhDagar
Copy link
Member Author

I think it's valuable to test that each function really does do dispatching. If you're just worried about lines of code, then you could always define a backend that just records the arguments to ua_function so you wouldn't need any code to mock individual functions. Something like:

@peterbell10 could you please explain this a bit more for me. I'm not very sure how we can call different functions without passing the required arguments which unlike scipy.fft will be more than just a single input x to test all the functions. Also, I don't understand why the __ua_function__ in your code snippet has no call to the implemented functions.
If you want, I can go ahead and add all the functions with separate test_backend_call_<unique_signature> tests for different functions.

Thanks in advance :)

Copy link
Member Author

@AnirudhDagar AnirudhDagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've followed a specific pattern for the arg replacer names based on reusability. See below.

Copy link
Member Author

@AnirudhDagar AnirudhDagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the careful review, I've made the changes.

@AnirudhDagar
Copy link
Member Author

I've made the changes according to @grlee77's comment which sounds like a good approach:

  1. This way we can keep the uarray backend through the multimethods, and the actual SciPy implementation of the functions separate. This offers better maintainability.
  2. Also, with this change, the behaviour between ndimage.<module>.<func> and ndimage.<func> remains the same, both calling the multimethods, addressing concerns raised here by @peterbell10.

A couple of other things to take note of:

  • I've made some changes to the tests calling private functions. They need to be called from the new (renamed) private module namespace since these private functions are not available in the public namespace.
  • Also, in anticipation of the deprecation warnings (to be added in a subsequent PR, similar to DEP: deprecate private but non-underscored signal.spline namespace #14419 ), I've removed any instance of function call through the ndimage.<module> namespace.

I guess some of the changes here overlap with the changes to be made for the deprecation of public like modules in ndimage but that's okay. I'll add a link to this comment in the description of the deprecation PR, just in case someone wants to understand the changes made, later.

@AnirudhDagar
Copy link
Member Author

AnirudhDagar commented Jul 19, 2021

Ahh, need to fix the merge conflicts before git mv otherwise git forgets that the files are just being renamed. (Edit: That's not the reason for the huge diff, see below)

Edit 1: Looks like if you add a new file with the same name which was earlier renamed to something else, github will show a huge diff. See AnirudhDagar#1 for demonstration.

@AnirudhDagar AnirudhDagar force-pushed the uarray_ndimage_backend branch 3 times, most recently from 31a9882 to 5ad4b12 Compare July 21, 2021 01:22
@AnirudhDagar
Copy link
Member Author

AnirudhDagar commented Jul 21, 2021

@peterbell10 @grlee77 @rgommers Thank you for your patience with this one.

As I mentioned earlier, the diff is extremely large when following @grlee77 suggestion in a single PR.

I propose to split this PR into three sequential PRs each adding the following:

  1. renaming modules (from module.py to _module.py)
  2. uarray multimethods (adds uarray support as well as brings back module.py for backwards compatibility) (Edit 1: we can actually use this PR after minor refactoring for this purpose)
  3. deprecating ndimage public looking private namespace (i.e. ndimage.module.func will raise warning)

To make sure the proposed method works in mitigating the huge diff, I tried it out on my own fork.
Please refer to the following in order to understand the proposed approach completely.

  1. Test renaming file, PR with huge diff AnirudhDagar/scipy#1 (explains the issue of huge diff)
  2. Rename ndimage AnirudhDagar/scipy#3 (demonstrates step 1, i.e renaming modules)
  3. Uarray ndimage backend trimmed master copy AnirudhDagar/scipy#4 (demonstrates step 2, i.e uarray multimethods )
  4. DEP: Deprecate ndimage public looking private modules AnirudhDagar/scipy#5 (demonstrates step 2, i.e deprecating ndimage modules)

This will keep git blame (You can look at my master_copy branch) and also avoid the huge diff change as seen in this PR currently.

Please let me know your opinion on this, if it looks good, then I'll go ahead :))

@ev-br
Copy link
Member

ev-br commented Jul 21, 2021

Drive-by comment: you can always send a series of PRs where PR N depends on N-1, and rebase N on master once N-1 is merged.

@rgommers
Copy link
Member

  • renaming modules (from module.py to _module.py)
  • uarray multimethods (adds uarray support as well as brings back module.py for backwards compatibility) (Edit 1: we can actually use this PR after minor refactoring for this purpose)

This isn't ideal, because we cannot merge PR 1 separately - that will break master, and the second PR may need more discussion. Your idea of splitting the rename off is great, but you want that new PR to do both the rename to _module.py and put back a new module.py. That PR diff will be large in GitHub, but that's fine. As long as the git mv and putting back the old name are two commits in a single PR, the git history is fine. We can merge that PR straight away, and master won't be broken.

After that, rebase this PR on master, and you have a small diff back here.

@AnirudhDagar
Copy link
Member Author

This isn’t ideal, because we cannot merge PR 1 separately - that will break master, and the second PR may need more discussion. Your idea of splitting the rename off is great, but you want that new PR to do both the rename to _module.py and put back a new module.py. That PR diff will be large in GitHub, but that’s fine. As long as the git mv and putting back the old name are two commits in a single PR, the git history is fine. We can merge that PR straight away, and master won’t be broken.
After that, rebase this PR on master, and you have a small diff back here.

Please see the offline discussion below we had regarding the comment above.

Anirudh:

Could you maybe elaborate on how PR 1 will break master?
Anyways, I feel I’ve overcomplicated things just to avoid the huge diff. If you think it is okay to make the merge with huge diff as long it still keeps git blame, I’m cool with taking that path too. :))

Ralf:

Maybe CI is not broken, because we do not directly test(e.g. filters.py). There are still two reasons not to do this:

  1. we do encourage downstream projects to test against master (just like we test NumPy master in Scipy CI)
  2. we want to have the master branch releasable at any time. if CI is green but we have to "remember" that we cannot branch a release without restoring filters.py et al., that's less than ideal.

In a nutshell, to conclude, I agree it is a much better idea to follow what was initially suggested by Ralf as the above points make a lot of sense and the way I was pitching it earlier is not compliant with these points. I'll send a PR for achieving this now.

Thanks again everyone for the patience and useful feedback/suggestions. :))

DOC: set backend str to scipy and fix docs

DOC: Fix notes about uarray domain hierarchy

MAINT: explicitly define ndimage __all__
 * ENH: Add filters multimethods

 * ENH: Add fourier multimethods

 * ENH: Add interpolation, measurements, morphology multimethods
@jakirkham
Copy link
Contributor

cc @GenevieveBuckley (in case you have thoughts on this 🙂)

@GenevieveBuckley
Copy link

Thanks for pinging me, @jakirkham

Let me know if it is necessary to include this type of test for ndimage as well.

I think this seems important. Apologies if this is something that's already been discussed/resolved, there's a going on here in the thread.

@AnirudhDagar
Copy link
Member Author

Hi @GenevieveBuckley,
Yeah, we are in fact running the whole ndimage test suite on uarray. See this comment. Sorry for not updating the PR description (will do that now).

@AnirudhDagar
Copy link
Member Author

AnirudhDagar commented Oct 8, 2021

Thanks, @grlee77 @peterbell10 for reviewing this multiple times, suggesting changes and overall improving the work here. I guess we are finally there. Let me know if we'd like to keep the commit history (in that case I can do an interactive rebase and improve if required as it is slightly messy right now) or squash the commits on merge.

cc @rgommers

@rgommers
Copy link
Member

rgommers commented Oct 9, 2021

Let me know if we'd like to keep the commit history (in that case I can do an interactive rebase and improve if required as it is slightly messy right now) or squash the commits on merge.

I think we'd probably want to squash-merge this. If you prefer you can rewrite the whole history, it should probably be ~4-5 commits then.

@AnirudhDagar
Copy link
Member Author

I don't have any strong opinions on this. Let's do a squash merge if that's ideal.

@AnirudhDagar
Copy link
Member Author

AnirudhDagar commented Oct 12, 2021

Gentle ping. Should be good to go. :)

@leofang once this is merged, I'd like to discuss the CuPy support.
Edit1: The next SciPy release will take some time. So my question is, can we use SciPy master in CuPy CI for leveraging this work to enable the development of uarray backend on the CuPy side?

Edit2: Please don't merge this yet! There is a bigger picture, we'll wait a few more days and hopefully get this merged then.

Thanks in advance!

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent, thanks a lot @AnirudhDagar for the hard work, and @peterbell10 and @grlee77 for the thorough reviews!

I'm just adding a blocking review for the sole reason that we should not merge this just yet. For the same reason as discussed in #14407 (comment): we need to (a) sketch the big picture of where this uarray interop work is going, and (b) have a discussion about it on the mailing list.

@AnirudhDagar wrote a very nice blog post that will help illustrate where we are heading: https://labs.quansight.org/blog/2021/10/array-libraries-interoperability/. A second blog post is in the making with more of a high-level technical overview - that should be ready within a few days (so please do not start the discussion now/here). Once that is published, we will have that discussion, and once it has settled down we can (presumably) merge this PR.

@leofang
Copy link
Contributor

leofang commented Oct 13, 2021

@leofang once this is merged, I'd like to discuss the CuPy support.
Edit1: The next SciPy release will take some time. So my question is, can we use SciPy master in CuPy CI for leveraging this work to enable the development of uarray backend on the CuPy side?

Thanks for the ping, @AnirudhDagar! This looks cool. @kmaehashi has been busy renovating CuPy's CI support recently, so it might be easier now to get this kind of tests done. He probably is in a better position to comment on this regard 🙂

@kmaehashi
Copy link

The next SciPy release will take some time. So my question is, can we use SciPy master in CuPy CI for leveraging this work to enable the development of uarray backend on the CuPy side?

Yeah, it should be easy to add a test axis to verify CuPy's uarray implementation against SciPy master 😄

@rgommers rgommers added this to the 1.9.0 milestone Nov 30, 2021

.. versionadded:: 1.8.0
"""
__ua_domain__ = "numpy.scipy.ndimage"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intention and I know this is done the same way for scipy.fft, but it's weird to call the hypothetical top-level backend numpy.
uarray supports a list of domains that can be used when implementing a single backend that supports multiple domains Quansight-Labs/uarray#243 and it's just easier to understand the domain hierarchy when there's no "numpy." at the beginning of the string.
@hameerabbasi, @peterbell10 what do you think, is it really necessary to have numpy here?

@tylerjereddy
Copy link
Contributor

@rgommers Where does the team stand on the overall design decisions/blocking comment above? Probably bet to bump the milestone before we branch ~later this week?

@rgommers
Copy link
Member

Yes, we're going to have to bump this unfortunately - not enough bandwidth to push these discussions forward right now. Moving to the 1.10.0 milestone.

@rgommers rgommers modified the milestones: 1.9.0, 1.10.0 May 24, 2022
@rgommers rgommers removed this from the 1.10.0 milestone Nov 30, 2022
@lucascolley lucascolley changed the title ENH: uarray based backend support for scipy.ndimage ENH: ndimage: uarray based backend support Jun 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org enhancement A new feature or improvement scipy.ndimage uarray Items related to the uarray backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.