ENH: stats: add array API support to some of `_axis_nan_policy` decorator #22857

mdhaber · 2025-04-18T06:28:08Z

Reference issue

Toward gh-20544

What does this implement/fix?

Adds array API support to scipy.stats.mode.

Update: yeah... I got a bit carried away, and now this also overhauls _axis_nan_policy to work with non-numpy arrays., since mode needed it for many of the tests. LMK if this needs to get broken up.

Additional information

Dask complains about not being able to compute the chunk size. Haven't dealt with this before, so wisdom appreciated.
CuPy runs into data-apis/array-api-compat#312.
The shortcut when a.ndim = 1 used np.unique, which treats NaNs as the same, but the standard says that NaNs are all different. There are a few options:

Add logic to make this work with unique_counts despite all NaNs being treated as distinct
Use xp.unique for all backends that it's defined, otherwise fall back to the generic, n-d array calculation
Always use the (slower for only a few long slices) generic, n-d array calculation
Thoughts?

[skip ci]

mdhaber · 2025-04-18T22:35:53Z

scipy/stats/_axis_nan_policy.py

@@ -268,12 +271,12 @@ def _check_empty_inputs(samples, axis):
    return output


-def _add_reduced_axes(res, reduced_axes, keepdims):
+def _add_reduced_axes(res, reduced_axes, keepdims, xp=np):


The changes in this file are pretty straightforward xp-translations that allow non-numpy arrays to go through the first part of the decorator, which makes a few things much easier for wrapped functions.

The dimensions specified by axis=None and axis tuples are raveled (so functions only have to consider scalar, in axis)

The remaining axis is moved to -1, which tends to make indexing easier.

keepdims is supported automatically by adding back any axes that get reduced away.

It also ensures that when the input array is too small (e.g. empty), the right thing happens (e.g. emit SmallSampleWarning, return NaN).

mdhaber · 2025-04-18T22:36:59Z

scipy/stats/_axis_nan_policy.py

+            if not is_numpy(xp):
+                res = hypotest_fun_out(*samples, axis=axis, **kwds)
+                res = result_to_tuple(res, n_out)
+                res = _add_reduced_axes(res, reduced_axes, keepdims, xp=xp)
+                return tuple_to_result(*res)


These are all pretty standard invocations; e.g., see 599-602.

scipy/stats/_stats_py.py

mdhaber · 2025-04-18T22:41:13Z

scipy/stats/_stats_py.py



 # When `order` is array-like with size > 1, moment produces an *array*
 # rather than a tuple, but the zeroth dimension is to be treated like
 # separate outputs. It is important to make the distinction between
 # separate outputs when adding the reduced axes back (`keepdims=True`).
 def _moment_tuple(x, n_out):
-    return tuple(x) if n_out > 1 else (x,)
+    return tuple(x[i, ...] for i in range(x.shape[0])) if n_out > 1 else (x,)


See code comment. For better or for worse, this is supposed to iterate over the zeroth axis, and we have to do that manually for array_api_strict.

[skip ci]

…default dtype?

rgommers · 2025-05-18T18:12:45Z

I got a bit carried away, and now this also overhauls _axis_nan_policy to work with non-numpy arrays., since mode needed it for many of the tests. LMK if this needs to get broken up.

That might be helpful. Now that I'm in the "review lots of array API PRs mode" I'm inclined to keep going, but it's easier to find time to review and merge in one go if they're smaller diffs and more self-contained.

This now has merge conflicts. Splitting it in two when updating may be nice.

mdhaber · 2025-05-18T18:35:38Z

Will do.

mdhaber · 2025-05-20T19:59:27Z

scipy/stats/_stats_py.py

@@ -486,11 +486,12 @@ def _mode_result(mode, count):
    # When a slice is empty, `_axis_nan_policy` automatically produces
    # NaN for `mode` and `count`. This is a reasonable convention for `mode`,
    # but `count` should not be NaN; it should be zero.
-    i = np.isnan(count)
+    xp = array_namespace(mode, count)


These could be reverted

ENH: stats.mode: add array API support

b05d733

[skip ci]

mdhaber added scipy.stats enhancement A new feature or improvement array types Items related to array API support and input array validation (see gh-18286) labels Apr 18, 2025

mdhaber mentioned this pull request Apr 18, 2025

ENH: stats: add array API-support #20544

Open

MAINT: add axis tuple/keepdims support to other backends

dcf9979

mdhaber force-pushed the xp_mode branch from 3bfc7de to dcf9979 Compare April 18, 2025 17:18

mdhaber added 2 commits April 18, 2025 13:12

MAINT: adjust decorator to correctly handle non-numpy backends

e599d3e

TST: stats: skip tests for Dask

bf000aa

mdhaber commented Apr 18, 2025

View reviewed changes

mdhaber marked this pull request as ready for review April 18, 2025 22:42

mdhaber added 7 commits April 18, 2025 21:41

MAINT: stats: consistent xp behavior for small samples

600ed2c

MAINT: stats: fix lint issues and other easy failures

4f57b89

[skip ci]

MAINT: stats: compensate for dask

cd019a1

Merge remote-tracking branch 'upstream/main' into xp_mode

f99f33c

MAINT: stats/special: fix tests, avoid warnings

7cf72f2

STY: stats/_lib: fix lint

88a66b4

MAINT: stats.mode: fix CI failure due to unique_counts not producing …

c17023b

…default dtype?

mdhaber requested review from person142 and steppi as code owners April 20, 2025 00:31

mdhaber removed request for steppi and person142 April 20, 2025 01:07

MAINT: stats: adjust tests to use default dtype

700380c

mdhaber marked this pull request as draft April 20, 2025 06:56

mdhaber mentioned this pull request Apr 20, 2025

TST: reintroduce eager_warns and fix free-threading test failures #22863

Merged

mdhaber added 3 commits May 20, 2025 12:06

Merge remote-tracking branch 'upstream/main' into xp_mode

421d088

MAINT: merge fixup

5586b9b

MAINT: stats: revert changes to mode

81ddf56

mdhaber changed the title ~~ENH: stats.mode: add array API support~~ ENH: stats: add array API support to some of _axis_nan_policy decorator May 20, 2025

mdhaber commented May 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: stats: add array API support to some of `_axis_nan_policy` decorator #22857

ENH: stats: add array API support to some of `_axis_nan_policy` decorator #22857

Uh oh!

mdhaber commented Apr 18, 2025 •

edited

Loading

Uh oh!

mdhaber Apr 18, 2025 •

edited

Loading

Uh oh!

mdhaber Apr 18, 2025

Uh oh!

Uh oh!

Uh oh!

mdhaber Apr 18, 2025 •

edited

Loading

Uh oh!

rgommers commented May 18, 2025

Uh oh!

mdhaber commented May 18, 2025

Uh oh!

mdhaber May 20, 2025

Uh oh!

Uh oh!

Uh oh!

ENH: stats: add array API support to some of _axis_nan_policy decorator #22857

Are you sure you want to change the base?

ENH: stats: add array API support to some of _axis_nan_policy decorator #22857

Uh oh!

Conversation

mdhaber commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Additional information

Uh oh!

mdhaber Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mdhaber Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rgommers commented May 18, 2025

Uh oh!

mdhaber commented May 18, 2025

Uh oh!

mdhaber May 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ENH: stats: add array API support to some of `_axis_nan_policy` decorator #22857

ENH: stats: add array API support to some of `_axis_nan_policy` decorator #22857

mdhaber commented Apr 18, 2025 •

edited

Loading

mdhaber Apr 18, 2025 •

edited

Loading

mdhaber Apr 18, 2025 •

edited

Loading