-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
Working on gh-20544, I wonder for certain functions whether we want to
a) add array API support,
b) declare legacy / deprecate + remove them, or
c) leave them as they are.
If the long-term goal is to make SciPy largely array-API compatible, I assume we will want to do one of the first two things.
I think it would be easiest to consider just one function at a time. Perhaps when there's a consensus about one (or if we reach an impasse), we move on to the next.
For today:
find_repeats
is trivial to implement in terms of the array API:
from array_api_compat import array_namespace
import array_api_strict as xp
# If there are NaNs, `scipy.stats.find_repeats` treats them as distinct.
# `unique_counts` does, too, so no special handling is required
def find_repeats(x):
xp = array_namespace(x)
unique, counts = xp.unique_counts(x)
repeat_mask = counts > 1
return unique[repeat_mask], counts[repeat_mask] # would return in result object
find_repeats(xp.asarray(x))
But because it is trivial, I wonder if it makes sense to keep around. For reference, python-api-inspect
saw no uses of find_repeats
outside tests in the 2019 blog post data. Popular functions are found dozens or hundreds of times. (Could re-run the numbers, but I'm guessing they are representative enough.) On Stack Overflow, all the appearances of find_repeats
seem to be people defining a different function called find_repeats
.
I think it could be vectorized using the techniques of _rankdata
, but the output would most naturally be a ragged array, so I'm not sure if we want to come up with a way of representing that information in an array API compatible way.
What should we do here?