ENH: stats.pearsonr: add support for `axis` argument #20137

mdhaber · 2024-02-22T17:34:07Z

Reference issue

What does this implement/fix?

This adds native support for an axis argument to pearsonr, greatly improving the speed of vectorized calls (e.g. pairwise comparisons).

Additional information

~~Failing existing tests pending resolution of gh-20136.~~ Worked around it.
~~Needs new tests of axis.~~ Added.

mdhaber · 2024-02-22T17:38:02Z

scipy/stats/_stats_py.py

-        r = dtype(np.sign(x[1] - x[0])*np.sign(y[1] - y[0]))
-        result = PearsonRResult(statistic=r, pvalue=1.0, n=n,
-                                alternative=alternative, x=x, y=y)
+        r = dtype(np.sign(x[..., 1] - x[..., 0])*np.sign(y[..., 1] - y[..., 0]))


axis=axis elsewhere may be misleading, because really we rely on axis=-1. (I suppose we could take_along_axis if desired, but it's simpler to just move the axis, and might be faster to work along last axis if we take the time to copy.)

Also, dtype(...) seems like a weird way to initialize the array. Why not (...).astype(dtype) or np.asarray(..., dtype=dtype)

[skip actions] [skip cirrus]

tylerjereddy · 2024-03-01T04:52:03Z

scipy/stats/_stats_py.py

-        def statistic(y):
-            statistic, _ = pearsonr(x, y, alternative=alternative)
+        def statistic(y, axis):
+            statistic, _ = pearsonr(x, y, axis=axis, alternative=alternative)


Looks like this and a few similar code blocks don't get hit by the regular test suite. Not sure how important that is, but ran a local check just now.

I think they're hit by test_resampling_pvalue, which is xslowed. Probably fast enough to run every time now that this is natively vectorized. Would you prefer that the xslow be removed?

mdhaber · 2024-03-17T01:58:21Z

@tirthasheshpatel Any chance we can get this into 1.13.0?

Sorry for the late milestone addition, but the feature has been requested many times on S.O. (in addition to the linked issue).

tylerjereddy · 2024-03-17T22:11:32Z

Branching is imminent, so I'll bump the milestone, apologies.

tirthasheshpatel · 2024-03-20T17:50:10Z

scipy/stats/_stats_py.py

@@ -4771,42 +4815,52 @@ def statistic(x, y):
    # dtype is the data type for the calculations.  This expression ensures
    # that the data type is at least 64 bit floating point.  It might have
    # more precision if the input is, for example, np.longdouble.
-    dtype = type(1.0 + x[0] + y[0])
+    dtype = type(1.0 + x.ravel()[0] + y.ravel()[0])


Is there a better way to do this?

I was in no-think-just-translate mode. We discussed correcting the calculation to respect user-provided dtype.

tirthasheshpatel

LGTM! Just a couple of stylistic comments.

scipy/stats/_stats_py.py

mdhaber · 2024-03-20T19:00:15Z

I know we discussed letting NumPy 2.0 take care of the scalar dtype issue, but I'm going to push one more commit that fixes it for the statistic. We'll let NumPy 2.0 / NEP 50 fix the scalar dtype conversion of the confidence interval.

mdhaber · 2024-03-21T03:01:36Z

I think this is all set. Thanks @tirthasheshpatel!

tirthasheshpatel

Everything looks good now, thanks @mdhaber!

WIP: stats.pearsonr: add support for axis argument

81491c2

mdhaber added scipy.stats enhancement A new feature or improvement labels Feb 22, 2024

mdhaber requested a review from tirthasheshpatel February 22, 2024 17:34

mdhaber commented Feb 22, 2024

View reviewed changes

mdhaber added 2 commits February 22, 2024 23:51

TST: stats.pearsonr: add multidimensional tests, pass existing tests

16de34e

MAINT: stats.pearsonr: fix CI failures

2215dc4

mdhaber changed the title ~~WIP: stats.pearsonr: add support for axis argument~~ W: stats.pearsonr: add support for axis argument Feb 23, 2024

mdhaber changed the title ~~W: stats.pearsonr: add support for axis argument~~ ENH: stats.pearsonr: add support for axis argument Feb 23, 2024

mdhaber marked this pull request as ready for review February 23, 2024 23:05

DOC: stats.pearsonr: fix doctest

3bfc1c0

[skip actions] [skip cirrus]

tylerjereddy reviewed Mar 1, 2024

View reviewed changes

mdhaber added this to the 1.13.0 milestone Mar 17, 2024

tylerjereddy modified the milestones: 1.13.0, 1.14.0 Mar 17, 2024

Merge remote-tracking branch 'upstream/main' into gh9307

d05645b

mdhaber mentioned this pull request Mar 19, 2024

ENH: stats.pearsonr: add array API support #20284

Merged

tirthasheshpatel reviewed Mar 20, 2024

View reviewed changes

MAINT: stats.pearsonr: respect dtype

88a27ff

mdhaber commented Mar 20, 2024

View reviewed changes

scipy/stats/_stats_py.py Outdated Show resolved Hide resolved

MAINT: stats.pearsonr: fixup

06c3758

MAINT: stats.pearsonr: scalar dtype fixup for statistic

5cbe4ab

mdhaber requested a review from tirthasheshpatel March 21, 2024 03:01

tirthasheshpatel approved these changes Mar 21, 2024

View reviewed changes

tirthasheshpatel merged commit ceea3e3 into scipy:main Mar 21, 2024

rgommers pushed a commit to rgommers/scipy that referenced this pull request Mar 24, 2024

ENH: stats.pearsonr: add support for axis argument (scipy#20137)

e670272

vnmabus mentioned this pull request Dec 17, 2024

Improve performance of pairwise distances computation vnmabus/dcor#40

Open

j-bowhay mentioned this pull request Apr 11, 2025

DOC: Fix incorrect versionadded tag for axis param in stats.pearsonr #22824

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: stats.pearsonr: add support for `axis` argument #20137

ENH: stats.pearsonr: add support for `axis` argument #20137

Uh oh!

mdhaber commented Feb 22, 2024 •

edited

Loading

Uh oh!

mdhaber Feb 22, 2024

Uh oh!

tirthasheshpatel Mar 20, 2024

Uh oh!

tylerjereddy Mar 1, 2024

Uh oh!

mdhaber Mar 1, 2024 •

edited

Loading

Uh oh!

mdhaber commented Mar 17, 2024 •

edited

Loading

Uh oh!

tylerjereddy commented Mar 17, 2024

Uh oh!

tirthasheshpatel Mar 20, 2024

Uh oh!

mdhaber Mar 20, 2024

Uh oh!

tirthasheshpatel left a comment

Uh oh!

Uh oh!

mdhaber commented Mar 20, 2024 •

edited

Loading

Uh oh!

mdhaber commented Mar 21, 2024

Uh oh!

tirthasheshpatel left a comment

Uh oh!

Uh oh!

Uh oh!

ENH: stats.pearsonr: add support for axis argument #20137

ENH: stats.pearsonr: add support for axis argument #20137

Uh oh!

Conversation

mdhaber commented Feb 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Additional information

Uh oh!

mdhaber Feb 22, 2024

Choose a reason for hiding this comment

Uh oh!

tirthasheshpatel Mar 20, 2024

Choose a reason for hiding this comment

Uh oh!

tylerjereddy Mar 1, 2024

Choose a reason for hiding this comment

Uh oh!

mdhaber Mar 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber commented Mar 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tylerjereddy commented Mar 17, 2024

Uh oh!

tirthasheshpatel Mar 20, 2024

Choose a reason for hiding this comment

Uh oh!

mdhaber Mar 20, 2024

Choose a reason for hiding this comment

Uh oh!

tirthasheshpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mdhaber commented Mar 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdhaber commented Mar 21, 2024

Uh oh!

tirthasheshpatel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ENH: stats.pearsonr: add support for `axis` argument #20137

ENH: stats.pearsonr: add support for `axis` argument #20137

mdhaber commented Feb 22, 2024 •

edited

Loading

mdhaber Mar 1, 2024 •

edited

Loading

mdhaber commented Mar 17, 2024 •

edited

Loading

mdhaber commented Mar 20, 2024 •

edited

Loading