Skip to content

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Jun 20, 2024

Reference issue

gh-20772
gh-19260

What does this implement/fix?

As discussed in gh-20772 and gh-19260, this is an alternative approach to dispatching linalg and signal functions to the native backend of the input arrays.

Rather than adding separate approaches for dispatching to each SciPy subpackage, it proposes maintaining a single dispatch mechanism based on the existing code from scipy.special (with minor generalizations - see diff here).

Also, rather than translating the original test suites to use array API calls, this would only add property-based tests to confirm that the functions return similar results with each backend as they do for NumPy/SciPy. The philosophy is that we do not need to test the quality of the alternative backend implementations to the same standards as we test the SciPy implementations; rather, we really just need to test the "plumbing": for typical cases, when we pass an alternative-backend array into one of these dispatched functions, is type in == type out and do the numerical values agree with those of NumPy/SciPy?
Of course, by adjusting the definition of "typical" in the tests, we can make the tests arbitrarily strong - for example, strong enough such that a function that passes the generic property based test would certainly pass all of SciPy's function-specific tests. (Note that this idea is orthogonal to the one above; we can certainly leverage the existing dispatch framework but not add these property-based tests, if we prefer to translate the test suites.)1

The intent is to reduce author and review time for new functions and to minimize the diff/be non-invasive (i.e. to keep the original code readable/avoid introducing bugs in existing functions). We may find that additional generalizations are needed as we add more functions. That's OK. The intent here is to take care of low hanging fruit, and we can generalize when the time comes.

Additional information

The first commit moves the key _support_alternative_backends functions from special to _lib/_array_api.py and generalizes it slightly. GitHub shows this as a large diff (remove from one place, add to another), but all that has changed is adding some additional arguments to generalize from special to any sub-package. I'd suggest comparing the diff manually.

The next two commits add support and tests for a few functions in signal and linalg, respectively.

I'll make some additional notes inline.

Footnotes

  1. One might ask why we didn't do these things for stats. Let's separate this into implementation and testing:
    Implementation: so far, we've been translating functions which don't have equivalents in other backends (for instance, ttest_ind is not implemented in JAX, CuPy, or Torch) and can be written efficiently in pure Python, array API standard calls (plus some dispatched scipy.special function calls). So in short, we can't just dispatch, and there would be little advantage to doing so. Instead, we translate.
    Testing: TBH, we could have left the existing pure-NumPy tests alone and only added property-based tests. However, the difference in stats compared to say, special, is that because we are providing the whole implementation for other backends rather than just dispatching, it's no longer sufficient just to check the basic plumbing. We really do need to ensure full test coverage to ensure that all lines of code work as intended for all backends. It's possible to make the property-based tests sufficiently strong, and it might have been a bit simpler/faster to go this route... I'm not sure. The truth is that we just didn't try it. One advantage to translating the test functions is that it's given us a chance to improve SciPy's tests.

@mdhaber mdhaber requested review from ev-br and lucascolley June 20, 2024 21:07
@github-actions github-actions bot added scipy.special scipy.linalg scipy.signal scipy._lib CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure Meson Items related to the introduction of Meson as the new build system for SciPy array types Items related to array API support and input array validation (see gh-18286) enhancement A new feature or improvement labels Jun 20, 2024
Comment on lines 560 to 561
def get_array_subpackage_func(f_name, xp, n_array_args, root_namespace,
subpackage, generic_implementations):
Copy link
Contributor Author

@mdhaber mdhaber Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When generalizing the signature, I didn't think terribly carefully about the new argument names, whether they should be keyword only, etc.

  • root_namespace - the name of the private module in which SciPy's implementation lies. For instance, for the special functions, it's _ufunc. Some generalization may be needed if the SciPy implementations are spread across multiple files. There are lots of possible solutions. The most obvious is to call get_array_subpackage_func with the appropriate root_namespace for the given f_name; another possibility is to import all the SciPy implementations into a single, private module, and set root_namespace to the name of that module.
  • subpackage - e.g. 'signal', 'linalg', 'special'
  • generic_implementations - a dictionary of callables that defines functions that may not have direct equivalents in all other backends but can be defined in terms of other xp or spx functions. We use this in special to define xlogy as x * xp.log(y) and chdtrc in terms of gammaincc (when available), for instance.

The intent was to produce a proof of concept; we can wordsmith everything to our heart's content if we prefer this approach.

Comment on lines +567 to +568
f = getattr(getattr(spx, subpackage, None), f_name, None)
f = f or getattr(getattr(xp, subpackage, None), f_name, None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CuPy's linalg is in cupy, but its signal is in cupyx. Perhaps there is a better way to handle this, but for now, look in one, and if not found, look in the other.


@functools.wraps(func)
def wrapped(*args, **kwargs):
xp = array_namespace(*args[:n_array_args])
Copy link
Contributor Author

@mdhaber mdhaber Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this assumes that the arrays come first as positional arguments. I'll explore solutions for passing arrays as keyword arguments in a follow-up commit if we think this has potential. If we have the developer specify the number of positional-only array arguments, a list of keyword-only arguments, and a dictionary of names of positions: keywords for those that can specified either way, some relatively simple logic can extract the arrays from *args and **kwargs.

Update: resolved for simple cases.

xp = array_namespace(*args[:n_array_args])
f = get_array_subpackage_func(f_name, xp, n_array_args, root_namespace,
subpackage, generic_implementations)
return f(*args, **kwargs)
Copy link
Contributor Author

@mdhaber mdhaber Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though special functions don't accept kwargs, I tested this a bit, and it seems to work unmodified. For instance:

import cupy as xp
from scipy import signal
sig = xp.linspace(0, 1, 100)
win = xp.linspace(0.25, 0.75, 50)
signal.convolve(sig, win, mode='same').shape  # (100,)
signal.convolve(sig, win, mode='full').shape  # (149,)
signal.convolve(sig, win, mode='valid').shape  # (51,)

Of course, this assumes that all keyword arguments are to be passed to all backends. We can add nice error message for those that aren't supported for all backends if desired.

@@ -121,6 +121,7 @@ def test_warning_calls_filters(warning_calls):
os.path.join('stats', '_stats_py.py'), # gh-20743
os.path.join('stats', 'tests', 'test_axis_nan_policy.py'), # gh-20694
os.path.join('_lib', '_util.py'), # gh-19341
os.path.join('_lib', '_array_api.py'), # TBA
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
os.path.join('_lib', '_array_api.py'), # TBA
os.path.join('_lib', '_array_api.py'), # 21012

warnings.simplefilter("ignore")
# Strange that this is needed. For now, filtering warnings
# to pass tests; we'll figure out a better strategy.
import cupyx.scipy.signal
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is probably a better way to do this, but:

import cupyx
getattr(cupyx.scipy, 'special')  # ok
getattr(cupyx.scipy, 'signal')  # AttributeError: module 'cupyx.scipy' has no attribute 'signal'

import cupyx.scipy.signal
getattr(cupyx.scipy, 'signal')  # ok

For now, I was just trying to get this to work. Alternatives welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x-ref cupy/cupy#8336. Hopefully CuPy will behave like SciPy at some point.

# `reversed` is for developer convenience: test new function first = less waiting
@pytest.mark.parametrize('f_name_n_args', reversed(array_special_func_map.items()))
@pytest.mark.parametrize('dtype', ['float32', 'float64'])
@pytest.mark.parametrize('shapes', [[(10, 10), (10,)]])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this makes the assumption that a valid first arg is 2d and the second should be 1d, which is true for many linalg functions. This can be generalized, or there is no reason why a single test needs to be parameterized over all functions. It's still more compact than translating all existing tests.

args_xp = [xp.asarray(arg[()], dtype=dtype_xp) for arg in args_np]

res = f(*args_xp)
ref = xp.asarray(f(*args_np), dtype=dtype_xp)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can include kwargs to be tested in the array_special_func_map.

Comment on lines +556 to +559
_SCIPY_ARRAY_API = os.environ.get("SCIPY_ARRAY_API", False)
array_api_compat_prefix = "scipy._lib.array_api_compat"


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_SCIPY_ARRAY_API = os.environ.get("SCIPY_ARRAY_API", False)
array_api_compat_prefix = "scipy._lib.array_api_compat"



def support_alternative_backends(f_name, n_array_args, root_namespace,
subpackage, generic_implementations):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is convenient in special to loop over function names, etc., to add array API support for functions. For other sub-packages, it might be better to create a version of this function that is applied as a decorator. I'll explore that idea in the next commit.

@mdhaber mdhaber changed the title ENH: linalg/signal: leverage scipy.special approach for array API dispatching ENH: linalg/signal/special: unify approach for array API dispatching Jun 21, 2024
@mdhaber
Copy link
Contributor Author

mdhaber commented Jun 21, 2024

The next three commits refactor a bit further but show how we could do the dispatching with a decorator. Even though we'd have to add decorated functions to the tests manually, it's a lot simpler overall because this approach wouldn't require changes to __init__.py. You wouldn't even need a _support_alternative_backends.py file for each subpackage, and generic implementations can be placed right next to the original function.

@mdhaber
Copy link
Contributor Author

mdhaber commented Jun 21, 2024

The latest commit resolves #21012 (comment) for the case in which array arguments are specified as consecutive positional or keyword arguments, which is the most common case. data-apis/array-api-compat#153 is an issue if the arguments are specified in an unusual way.

At least one of the property-based array API tests will fail because det returns a float64 given float32 NumPy arrays. Looks like this is documented, so an exception can be made in the test.

@@ -280,6 +281,7 @@ def eig(a, b=None, left=False, right=True, overwrite_a=False,
return w, vr


@_xp_dispatch(['a', 'b'], 'linalg')
Copy link
Contributor Author

@mdhaber mdhaber Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all it takes to get the dispatching working as long as the array arguments are all consecutive at the beginning of the signature (very common). They can be positional only or positional/keyword. I think this will work for all linalg functions. I found a handful of signal functions that don't fit this mold, and that's fine - we just need to look closer to know whether it's worth generalizing or whether these can be treated ad hoc.

Currently, this assumes the xp-backend function has the same signature as the SciPy function, so it doesn't emit helpful errors if this is not the case and unsupported arguments are passed. There are plenty of ways to deal with this, but I'd rather not go too much deeper into this without initial feedback.

Copy link
Member

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks pretty good to me! The diff is pleasingly small. Agreed that we do not need to check that other libraries' functions pass all of our tests. Importantly, I like that this seems quite flexible - we can choose which functions are included, and add generic implementations at a later point, quite easily.


_generic_implementations = {}

array_special_func_map = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
array_special_func_map = {
array_linalg_func_map = {


_generic_implementations = {}

array_special_func_map = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
array_special_func_map = {
array_signal_func_map = {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably will just make it array_func_map or so. Or TBH might remove this and use the decorator for everything? The reason we didn't do that in special is that there's not a Python function to decorate, but for the other modules, it would probably be nicer?

@mdhaber
Copy link
Contributor Author

mdhaber commented Jun 24, 2024

Thanks @lucascolley. Do you want all functions to support check_finite, or should I just work on emitting a nicer error if the user passes arguments not supported by the array API extension?

@lucascolley
Copy link
Member

Let's start with just emitting nice errors. We should figure out how to start supporting other keywords in an agnostic way, though. Can we somehow only support check_finite if it is in the function's signature to start with?

Copy link
Member

@ev-br ev-br left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The philosophy is that we do not need to test the quality of the alternative backend implementations to the same standards as we test the SciPy implementations; rather, we really just need to test the "plumbing": for typical cases, when we pass an alternative-backend array into one of these dispatched functions, is type in == type out and do the numerical values agree with those of NumPy/SciPy?

FTR, I disagree with this. The existing SciPy test suite serves as a contract to what the the SciPy functionality actually is. If we are to dispatch to other backends, we should IMO keep the contract and be explicit of what we xfail/skip.

@lucascolley
Copy link
Member

The existing SciPy test suite serves as a contract to what the the SciPy functionality actually is. If we are to dispatch to other backends, we should IMO keep the contract and be explicit of what we xfail/skip.

I think that it would be a good idea to convert the tests eventually1, but while the behaviour is experimental and opt-in via an env variable, surely we are fine to punt on it for now. Any users who are going out of their way to help us test this experimental dispatching should be smart enough to send bug reports to the appropriate library, rather than to us.

Footnotes

  1. the tests could feasibly be replaced by property-based tests if we can be confident that no coverage is lost, but that is a completely orthogonal discussion

@ev-br
Copy link
Member

ev-br commented Jun 24, 2024

For the dispatch itself, I've two high-level comments first.

  1. The current approach seems to involve populating a submodule-size dict per submodule, is this correct?

How about using the fact that a module is a namespace itself, and do something along the lines of

- submodule/__init__.py:

from _support_alternative_backends import ... # explicit names, not import *
__all__ = _submodule_api.__all__


- submodule/_support_alternative_backends.py:

from ._submodule_api import * 

# wrap/dispatch on all imported names here


- submodule/_submodule_api.py:

from _private_modules import *    # or explicit names

# populate the __all__ list

  1. The dispatch approach still seems to rely on either a number of leading arguments (or kwargs?). I don't believe it scales to all scipy APIs. How does, e.g. https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.convolve1d.html#scipy.ndimage.convolve1d fits?
    Note the output optional which comes after axis.

@ev-br
Copy link
Member

ev-br commented Jun 24, 2024

I think that it would be a good idea to convert the tests eventually1, but while the behaviour is experimental and opt-in via an env variable, surely we are fine to punt on it for now.

I disagree.

@lucascolley
Copy link
Member

I disagree.

Okay, let's split this discussion out to gh-21031.

@mdhaber
Copy link
Contributor Author

mdhaber commented Jun 24, 2024

  1. The current approach seems to involve populating a submodule-size dict per submodule, is this correct?

IIUC you're referring to the array_special_func_map and similar, and you're proposing that we don't need to define such a map, we can just use __all__? I think there are a few reasons to define the map: we don't necessarily want to dispatch everything (some specific attention is desirable), and we need to provide a little information about the function's signature (unless we want to do some fancy introspection).

However, this map is just what I did for special. I also show in this PR how we could use a decorator instead of a such a map. If we go with the decorator approach, we can eliminate _support_alternative_backend.py from each submodule.

  1. The dispatch approach still seems to rely on either a number of leading arguments (or kwargs?). I don't believe it scales to all scipy APIs.

No, if we're not allowed to make any modifications, it will not work for all APIs. But "I think this will work for all linalg functions. I found a handful of signal functions that don't fit this mold, and that's fine - we just need to look closer to know whether it's worth generalizing or whether these can be treated ad hoc." (#21012 (comment).). We can move quickly through the low-hanging fruit, identify the functions that don't fit the mold, and make the generalizations as needed. If we find that we need to add support for functions like your convolve_dispatcher, etc., to handle the special cases, we can certainly do that. This just proposes that we don't need all the boilerplate for simple functions.

As for convolve1d in particular, I'm not sure. It will not pass that argument to array_namespace, so perhaps the user would get a different error message if, say, the output is not a CuPy array but the first two arguments are. But I think otherwise it will work for CuPy, which seems to be the only other backend with a direct convolve1d equivalent. For other backends, one would have to write custom code anyway if we want to support the output argument.

@mdhaber
Copy link
Contributor Author

mdhaber commented Jun 24, 2024

So for convolve1d it works in the sense that:

import cupy as xp
from scipy.ndimage import convolve1d
input = [2, 8, 0, 4, 1, 9, 9, 0]
weights = [1, 3]
output = xp.empty(8, dtype=int)
convolve1d(xp.asarray(input), weights=xp.asarray(weights), output=output)
print(output, type(output))
# [14 24  4 13 12 36 27  0] <class 'cupy.ndarray'>

However, if the user were to pass an invalid argument for output, they would get whatever error CuPy provides, not e.g. "TypeError: Multiple namespaces for array inputs:".

An example of a function that wouldn't work for all arguments as written is sosfilt, where axis separates two array arguments. Currently, array_namespace produces an error if one of the arguments is an int. This already causes problems when we're translating functions and arguments can be either Python scalars or arrays, but if it were not changed, then passing the axis argument as an int would make array_namespace raise an error, and passing axis as a 0d CuPy array makes CuPy raise an error since it doesn't interpret this as an int. One can imagine variations to the code that would get arround this, e.g. having array_namespace ignore Python scalars as it ignores None, or having a way of skipping arguments to be passed to array_namespace.

For the sake of argument, I can also imagine examples of signatures in stats in which weights is accepted as a keyword-only argument. (We will actually be translating these functions manually, not dispatching, but there might be similar cases in other modules that we'd want to dispatch.)

  • If the alternative backend has the desired function, the only problem is that a weights array would not be passed into array_namespace for validation.
  • If the alternative backend does not have the desired function and we need to fall back to the SciPy implementation with a NumPy array, the problem is that the current code would not convert weights to a NumPy array before passing it into the SciPy function. So if the SciPy function itself does not call np.asarray on weights and the conversion doesn't happen implicitly, there could be a problem.

Again, if we find a large class of functions like this that we want to dispatch, we either ensure that the SciPy versions of the functions convert to NumPy arrays, or can generalize the code to handle keyword-only array arguments.

@ev-br ev-br mentioned this pull request Jul 1, 2024
2 tasks
@ev-br
Copy link
Member

ev-br commented Jul 1, 2024

The current approach seems to involve populating a submodule-size dict per submodule, is this correct?

IIUC you're referring to the array_special_func_map and similar, and you're proposing that we don't need to define such a map, we can just use all? I think there are a few reasons to define the map: we don't necessarily want to dispatch everything (some specific attention is desirable), and we need to provide a little information about the function's signature (unless we want to do some fancy introspection).

Here's what I meant: #21091
This can be probably simplified quite a bit, but the main idea is to have a namespace instead of a dict.
Maybe we can avoid messing with sys.modules and use vanilla imports even?

We can move quickly through the low-hanging fruit, identify the functions that don't fit the mold, and make the generalizations as needed.

gh-21091 has the whole of ndimage, and it's explicit which ones need what. Not sure how to simplify it further.

@mdhaber
Copy link
Contributor Author

mdhaber commented Jul 1, 2024

Here's what I meant

OK, so it looks like you're using _ndimage_api.__all__ plus scipy/ndimage/_dispatchers.py instead of a dictionary defined in _support_alternative_backends. Sure, that's fine. The other approach shown here was to apply a decorator directly to the function, no dictionary or namespace file necessary. There are pros/cons of each.

...has the whole of ndimage

OK, since the first argument of all of those functions is an array, I think that functions wrapped per this PR would probably pass all the same tests with CuPy as those wrapped by gh-21091. The difference is just that if the user were to pass arrays with different backends, this PR would not raise an error immediately in array_namespace.

IIUC, gh-21091 does not convert other CPU arrays (e.g. torch CPU) to NumPy, run the SciPy function, and convert back. Is that right?

@mdhaber mdhaber closed this Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure enhancement A new feature or improvement Meson Items related to the introduction of Meson as the new build system for SciPy scipy._lib scipy.linalg scipy.signal scipy.special
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants