-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
ENH: stats.pearsonr: add support for axis
argument
#20137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
scipy/stats/_stats_py.py
Outdated
r = dtype(np.sign(x[1] - x[0])*np.sign(y[1] - y[0])) | ||
result = PearsonRResult(statistic=r, pvalue=1.0, n=n, | ||
alternative=alternative, x=x, y=y) | ||
r = dtype(np.sign(x[..., 1] - x[..., 0])*np.sign(y[..., 1] - y[..., 0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis=axis
elsewhere may be misleading, because really we rely on axis=-1
. (I suppose we could take_along_axis
if desired, but it's simpler to just move the axis, and might be faster to work along last axis if we take the time to copy.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, dtype(...)
seems like a weird way to initialize the array. Why not (...).astype(dtype)
or np.asarray(..., dtype=dtype)
axis
argumentaxis
argument
axis
argumentaxis
argument
[skip actions] [skip cirrus]
def statistic(y): | ||
statistic, _ = pearsonr(x, y, alternative=alternative) | ||
def statistic(y, axis): | ||
statistic, _ = pearsonr(x, y, axis=axis, alternative=alternative) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this and a few similar code blocks don't get hit by the regular test suite. Not sure how important that is, but ran a local check just now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they're hit by test_resampling_pvalue
, which is xslow
ed. Probably fast enough to run every time now that this is natively vectorized. Would you prefer that the xslow
be removed?
@tirthasheshpatel Any chance we can get this into 1.13.0? Sorry for the late milestone addition, but the feature has been requested many times on S.O. (in addition to the linked issue). |
Branching is imminent, so I'll bump the milestone, apologies. |
scipy/stats/_stats_py.py
Outdated
@@ -4771,42 +4815,52 @@ def statistic(x, y): | |||
# dtype is the data type for the calculations. This expression ensures | |||
# that the data type is at least 64 bit floating point. It might have | |||
# more precision if the input is, for example, np.longdouble. | |||
dtype = type(1.0 + x[0] + y[0]) | |||
dtype = type(1.0 + x.ravel()[0] + y.ravel()[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was in no-think-just-translate mode. We discussed correcting the calculation to respect user-provided dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a couple of stylistic comments.
I know we discussed letting NumPy 2.0 take care of the scalar dtype issue, but I'm going to push one more commit that fixes it for the statistic. We'll let NumPy 2.0 / NEP 50 fix the scalar dtype conversion of the confidence interval. |
I think this is all set. Thanks @tirthasheshpatel! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good now, thanks @mdhaber!
Reference issue
Closes gh-9307
What does this implement/fix?
This adds native support for an
axis
argument topearsonr
, greatly improving the speed of vectorized calls (e.g. pairwise comparisons).Additional information
Failing existing tests pending resolution of gh-20136.Worked around it.Needs new tests ofAdded.axis
.