Skip to content

Conversation

IvanYashchuk
Copy link

@IvanYashchuk IvanYashchuk commented Jan 11, 2022

List of ported functions:

  • odd_ext
  • even_ext
  • const_ext
  • zero_ext

List of functions that were already working with Array API arrays:

  • axis_slice
  • axis_reverse

Reference issue

Ref. #15354

What does this implement/fix?

This PR makes functions from scipy.signal._arraytools accept and return arrays from Array API compatible libraries such as numpy.array_api and cupy.array_api. Only numpy.array_api is tested here.

cc: @AnirudhDagar, @rgommers, @czgdp1807

@IvanYashchuk IvanYashchuk force-pushed the ivan/array-api-signal-arraytools branch from 48c87df to 84d9a6a Compare January 11, 2022 14:47
return xp


def _concatenate(arrays, axis):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this pattern. Creating a wrapper and using it as a drop in replacement across the library. It should be quite easy to deprecate in the distant future when numpy.array_api will stabilise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we should try to follow a similar path to scikit-learn. There's a long discussion here that Pamphile linked from the issue: scikit-learn/scikit-learn#22352

Are we converging to similar approaches now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also saw scikit-learn attempt to use NEP37 like scikit-learn/scikit-learn#16574. It is hard to keep track of all the approaches. There's also NEP47 approach used here of course: https://numpy.org/neps/nep-0047-array-api-standard.html#appendix-a-possible-get-namespace-implementation

So, it is pretty much decided to drop/not accept NEP37 approach then? I initially went down that path because I had noticed that CuPy already had https://docs.cupy.dev/en/stable/reference/generated/cupy.get_array_module.html.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to comply with NumPy behavior for alternative array libraries (like CuPy) and Array API solves this problem by introducing a specification designed together with other libraries. NEP47 references NEP37 and says:

The array API standard was constructed with the help of such comparisons, only between many array libraries rather than only between NumPy and one other library.

So, yes, NEP47 supersedes NEP37. CuPy also supports Array API: https://docs.cupy.dev/en/stable/reference/array_api.html.

List of ported functions:
* axis_slice
* axis_reverse
* odd_ext
* even_ext
* const_ext
* zero_ext
@IvanYashchuk IvanYashchuk force-pushed the ivan/array-api-signal-arraytools branch from 84d9a6a to 3cf80be Compare January 12, 2022 09:07
@tupui
Copy link
Member

tupui commented Jan 12, 2022

Thanks @IvanYashchuk for the PR. Can you send an email to the mailing list linking it? Since this introduces the helper function, we should make all maintainers aware.

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks easy to add, we just need to have these common functions like _concatenate then. A few comments for me.

)


def _get_namespace(*arrays):
Copy link
Member

@tupui tupui Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I am not a fan of get something. I would prefer something even more explicit like namespace_from_array or something with array API in the name.

Or we say it's enough to do array_api.concatenate (fine with me).

In any case, I would personally remove the underscores everywhere since it's already in a private module (_lib).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace_from_array

+1. A better name.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with dropping the underscores, it's just what I noticed is used in SciPy, for example scipy._lib._util_asarray_validated.

I don't have a preference for the name of this function. Array API is mentioned in the module name scipy._lib._array_api_util. I can change it to scipy._lib._array_api_util._namespace_from_arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the underscore we are a bit inconsistent yes. Let's see what others think.

for x in arrays
}

if len(namespaces) != 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here. Better to fail explicitly than trying out an arbitrary conversion from one backend to another. Otherwise we will have reports of weird behaviours and performances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we can consider, trying to convert lists into either np or the only other backend which is present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might lead to confusions, because the code will work and the user would think that they are receiving the optimum performance but that might not be the case as always. IMO, failing is the best option in case of conflicting namespaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that it would introduce a backward incompatibility in the sense that some user expect the function to work when they pass a list instead of an array of some type.

We could also add a (deprecation) warning instead of failing in that case so user can migrate their code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add a (deprecation) warning instead of failing in that case so user can migrate their code.

Yes. Though, eventually we should fail in case of conflicting namespaces. For migrating to such failures, as you said, we can use DeprecationWarning for a better user experience.

@tupui tupui added the enhancement A new feature or improvement label Jan 12, 2022
@ev-br
Copy link
Member

ev-br commented Jan 13, 2022

It'd be nice to understand the relation of this array api work to a parallel work which deals with the uarray backends. Are these two related? Complementary? Competing?

(It's probably just my ignorance, but nonetheless)

Add version information when array API helper is added.

Co-authored-by: Pamphile Roy <roy.pamphile@gmail.com>
@IvanYashchuk
Copy link
Author

It'd be nice to understand the relation of this array api work to a parallel work which deals with the uarray backends. Are these two related? Complementary? Competing?

(It's probably just my ignorance, but nonetheless)

Hey @ev-br! They are complementary. The goal is the same: make SciPy functions work with non-NumPy arrays (CuPy, Dask, PyTorch, JAX, etc.). The means to achieve that goal are different.

SciPy functions implemented with pure Python+NumPy functions can be ported to use Array API compatible modules, where we replace numpy module in SciPy with an input_array.__array_namespace__(). __array_namespace__ carries a reference to a specific array library module it was created with. So we determine the library to be used dynamically based on the inputs of the function.

uarray backends are needed for the functions which directly rely on the memory buffers of NumPy arrays and use compiled code extensions. Here we need to be able to redirect SciPy function calls to an array library specific implementation (e.g. scipy.linalg.lu_factor -> cupyx.scipy.linalg.lu_factor).

Copy link
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought--is it easy to demonstrate a performance improvement with CuPy here? These seem like mostly private functions I'm less familiar with? I realize that demonstrating a performance improvement may be somewhat orthogonal, but on the other hand certainly can make things more compelling and help make sure that we don't i.e., lose out on practical performance gains on reasonable data sizes with HtoD and DtoH transfers for example.

I think typically you need fairly large arrays for gains--happy to test on a suitable node/GPU if there's a reproducer for a speedup here.

@IvanYashchuk
Copy link
Author

The original idea was to start with a small PR to demonstrate the way for Array API compatibility.

I will prepare a demo showing performance improvement with CuPy. It shouldn't be difficult.

@tupui
Copy link
Member

tupui commented Apr 12, 2023

Just a thought--is it easy to demonstrate a performance improvement with CuPy here? These seem like mostly private functions I'm less familiar with? I realize that demonstrating a performance improvement may be somewhat orthogonal, but on the other hand certainly can make things more compelling and help make sure that we don't i.e., lose out on practical performance gains on reasonable data sizes with HtoD and DtoH transfers for example.

I think typically you need fairly large arrays for gains--happy to test on a suitable node/GPU if there's a reproducer for a speedup here.

@mdhaber had a small experiment here mdhaber#63

@rgommers do we have a concrete plan/timeline for this work?

@rgommers
Copy link
Member

Thanks for circling back to this Tyler & all. Good timing, I was just writing up a larger proposal and will post it within the next 24 hours.

tl;dr we do want to go ahead with array API standard support, and we have bandwidth to push it. We've learned some lessons from the more extensive efforts in scikit-learn on how to go about it. There's clearly going to be large performance gains that are achievable with GPU support; @mdhaber did some experiments with scipy.stats functionality on that I believe. Also read the PR description of scikit-learn/scikit-learn#25956, that's pretty convincing:)

@rgommers
Copy link
Member

Good timing, I was just writing up a larger proposal and will post it within the next 24 hours.

Done in gh-18286. Now that the dust has settled and scikit-learn folks seem happy with the current incarnation, I'm much more comfortable with pushing this forward pretty quickly now.

@lucascolley
Copy link
Member

superseded by gh-19001

@lucascolley lucascolley added the array types Items related to array API support and input array validation (see gh-18286) label Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) enhancement A new feature or improvement scipy.signal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants