Skip to content

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Apr 18, 2025

Reference issue

Toward gh-20544

What does this implement/fix?

Adds array API support to scipy.stats.mode.

Update: yeah... I got a bit carried away, and now this also overhauls _axis_nan_policy to work with non-numpy arrays., since mode needed it for many of the tests. LMK if this needs to get broken up.

Additional information

Dask complains about not being able to compute the chunk size. Haven't dealt with this before, so wisdom appreciated.
CuPy runs into data-apis/array-api-compat#312.
The shortcut when a.ndim = 1 used np.unique, which treats NaNs as the same, but the standard says that NaNs are all different. There are a few options:

  • Add logic to make this work with unique_counts despite all NaNs being treated as distinct
  • Use xp.unique for all backends that it's defined, otherwise fall back to the generic, n-d array calculation
  • Always use the (slower for only a few long slices) generic, n-d array calculation
    Thoughts?

@mdhaber mdhaber added scipy.stats enhancement A new feature or improvement array types Items related to array API support and input array validation (see gh-18286) labels Apr 18, 2025
@@ -268,12 +271,12 @@ def _check_empty_inputs(samples, axis):
return output


def _add_reduced_axes(res, reduced_axes, keepdims):
def _add_reduced_axes(res, reduced_axes, keepdims, xp=np):
Copy link
Contributor Author

@mdhaber mdhaber Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file are pretty straightforward xp-translations that allow non-numpy arrays to go through the first part of the decorator, which makes a few things much easier for wrapped functions.

  • The dimensions specified by axis=None and axis tuples are raveled (so functions only have to consider scalar, in axis)
  • The remaining axis is moved to -1, which tends to make indexing easier.
  • keepdims is supported automatically by adding back any axes that get reduced away.

It also ensures that when the input array is too small (e.g. empty), the right thing happens (e.g. emit SmallSampleWarning, return NaN).

Comment on lines 557 to 561
if not is_numpy(xp):
res = hypotest_fun_out(*samples, axis=axis, **kwds)
res = result_to_tuple(res, n_out)
res = _add_reduced_axes(res, reduced_axes, keepdims, xp=xp)
return tuple_to_result(*res)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all pretty standard invocations; e.g., see 599-602.



# When `order` is array-like with size > 1, moment produces an *array*
# rather than a tuple, but the zeroth dimension is to be treated like
# separate outputs. It is important to make the distinction between
# separate outputs when adding the reduced axes back (`keepdims=True`).
def _moment_tuple(x, n_out):
return tuple(x) if n_out > 1 else (x,)
return tuple(x[i, ...] for i in range(x.shape[0])) if n_out > 1 else (x,)
Copy link
Contributor Author

@mdhaber mdhaber Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See code comment. For better or for worse, this is supposed to iterate over the zeroth axis, and we have to do that manually for array_api_strict.

@mdhaber mdhaber marked this pull request as ready for review April 18, 2025 22:42
@rgommers
Copy link
Member

I got a bit carried away, and now this also overhauls _axis_nan_policy to work with non-numpy arrays., since mode needed it for many of the tests. LMK if this needs to get broken up.

That might be helpful. Now that I'm in the "review lots of array API PRs mode" I'm inclined to keep going, but it's easier to find time to review and merge in one go if they're smaller diffs and more self-contained.

This now has merge conflicts. Splitting it in two when updating may be nice.

@mdhaber
Copy link
Contributor Author

mdhaber commented May 18, 2025

Will do.

@mdhaber mdhaber changed the title ENH: stats.mode: add array API support ENH: stats: add array API support to some of _axis_nan_policy decorator May 20, 2025
@@ -486,11 +486,12 @@ def _mode_result(mode, count):
# When a slice is empty, `_axis_nan_policy` automatically produces
# NaN for `mode` and `count`. This is a reasonable convention for `mode`,
# but `count` should not be NaN; it should be zero.
i = np.isnan(count)
xp = array_namespace(mode, count)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be reverted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants