-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
ENH: port scipy.signal._arraytools to be Array API compatible #15395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: port scipy.signal._arraytools to be Array API compatible #15395
Conversation
48c87df
to
84d9a6a
Compare
return xp | ||
|
||
|
||
def _concatenate(arrays, axis): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this pattern. Creating a wrapper and using it as a drop in replacement across the library. It should be quite easy to deprecate in the distant future when numpy.array_api
will stabilise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we should try to follow a similar path to scikit-learn
. There's a long discussion here that Pamphile linked from the issue: scikit-learn/scikit-learn#22352
Are we converging to similar approaches now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also saw scikit-learn
attempt to use NEP37 like scikit-learn/scikit-learn#16574. It is hard to keep track of all the approaches. There's also NEP47 approach used here of course: https://numpy.org/neps/nep-0047-array-api-standard.html#appendix-a-possible-get-namespace-implementation
So, it is pretty much decided to drop/not accept NEP37 approach then? I initially went down that path because I had noticed that CuPy already had https://docs.cupy.dev/en/stable/reference/generated/cupy.get_array_module.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's difficult to comply with NumPy behavior for alternative array libraries (like CuPy) and Array API solves this problem by introducing a specification designed together with other libraries. NEP47 references NEP37 and says:
The array API standard was constructed with the help of such comparisons, only between many array libraries rather than only between NumPy and one other library.
So, yes, NEP47 supersedes NEP37. CuPy also supports Array API: https://docs.cupy.dev/en/stable/reference/array_api.html.
List of ported functions: * axis_slice * axis_reverse * odd_ext * even_ext * const_ext * zero_ext
84d9a6a
to
3cf80be
Compare
Thanks @IvanYashchuk for the PR. Can you send an email to the mailing list linking it? Since this introduces the helper function, we should make all maintainers aware. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks easy to add, we just need to have these common functions like _concatenate
then. A few comments for me.
) | ||
|
||
|
||
def _get_namespace(*arrays): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I am not a fan of get
something. I would prefer something even more explicit like namespace_from_array
or something with array API in the name.
Or we say it's enough to do array_api.concatenate
(fine with me).
In any case, I would personally remove the underscores everywhere since it's already in a private module (_lib
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
namespace_from_array
+1. A better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with dropping the underscores, it's just what I noticed is used in SciPy, for example scipy._lib._util_asarray_validated
.
I don't have a preference for the name of this function. Array API is mentioned in the module name scipy._lib._array_api_util
. I can change it to scipy._lib._array_api_util._namespace_from_arrays
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the underscore we are a bit inconsistent yes. Let's see what others think.
for x in arrays | ||
} | ||
|
||
if len(namespaces) != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree here. Better to fail explicitly than trying out an arbitrary conversion from one backend to another. Otherwise we will have reports of weird behaviours and performances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we can consider, trying to convert lists into either np or the only other backend which is present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might lead to confusions, because the code will work and the user would think that they are receiving the optimum performance but that might not be the case as always. IMO, failing is the best option in case of conflicting namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that it would introduce a backward incompatibility in the sense that some user expect the function to work when they pass a list instead of an array of some type.
We could also add a (deprecation) warning instead of failing in that case so user can migrate their code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also add a (deprecation) warning instead of failing in that case so user can migrate their code.
Yes. Though, eventually we should fail in case of conflicting namespaces. For migrating to such failures, as you said, we can use DeprecationWarning
for a better user experience.
It'd be nice to understand the relation of this array api work to a parallel work which deals with the uarray backends. Are these two related? Complementary? Competing? (It's probably just my ignorance, but nonetheless) |
Add version information when array API helper is added. Co-authored-by: Pamphile Roy <roy.pamphile@gmail.com>
Hey @ev-br! They are complementary. The goal is the same: make SciPy functions work with non-NumPy arrays (CuPy, Dask, PyTorch, JAX, etc.). The means to achieve that goal are different. SciPy functions implemented with pure Python+NumPy functions can be ported to use Array API compatible modules, where we replace uarray backends are needed for the functions which directly rely on the memory buffers of NumPy arrays and use compiled code extensions. Here we need to be able to redirect SciPy function calls to an array library specific implementation (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought--is it easy to demonstrate a performance improvement with CuPy here? These seem like mostly private functions I'm less familiar with? I realize that demonstrating a performance improvement may be somewhat orthogonal, but on the other hand certainly can make things more compelling and help make sure that we don't i.e., lose out on practical performance gains on reasonable data sizes with HtoD and DtoH transfers for example.
I think typically you need fairly large arrays for gains--happy to test on a suitable node/GPU if there's a reproducer for a speedup here.
The original idea was to start with a small PR to demonstrate the way for Array API compatibility. I will prepare a demo showing performance improvement with CuPy. It shouldn't be difficult. |
@mdhaber had a small experiment here mdhaber#63 @rgommers do we have a concrete plan/timeline for this work? |
Thanks for circling back to this Tyler & all. Good timing, I was just writing up a larger proposal and will post it within the next 24 hours. tl;dr we do want to go ahead with array API standard support, and we have bandwidth to push it. We've learned some lessons from the more extensive efforts in scikit-learn on how to go about it. There's clearly going to be large performance gains that are achievable with GPU support; @mdhaber did some experiments with |
Done in gh-18286. Now that the dust has settled and scikit-learn folks seem happy with the current incarnation, I'm much more comfortable with pushing this forward pretty quickly now. |
superseded by gh-19001 |
List of ported functions:
List of functions that were already working with Array API arrays:
Reference issue
Ref. #15354
What does this implement/fix?
This PR makes functions from
scipy.signal._arraytools
accept and return arrays from Array API compatible libraries such asnumpy.array_api
andcupy.array_api
. Onlynumpy.array_api
is tested here.cc: @AnirudhDagar, @rgommers, @czgdp1807