Permutation Ttest (new PR) #4824

mortonjt · 2015-05-07T04:18:05Z

This is a new PR continuing from #4440

rgommers · 2015-05-07T20:02:13Z

@mortonjt you lost something along the way it looks like. There's missing blank lines in the docstring that were fixed in the other PR, and I'm not sure you have the fix for unequal length inputs here.

Do you have your other branch still with the fixes you made on top of my branch? If you point me too it I'll try to clean things up.

mortonjt · 2015-05-07T22:51:10Z

I added in the white spaces again - I didn't notice these changes getting overwritten when I merged your branch....

Perhaps I'm not completely understanding the unequal lengths problem. In the last PR, you pointed out the following scenario

>>> from scipy import stats
>>> import numpy as np
>>> np.random.seed(0)
>>> rvs1 = stats.norm.rvs(loc=5, scale=10, size=500)
>>> rvs5 = stats.norm.rvs(loc=8, scale=20, size=100)
>>> np_t_stats, pvalues = stats.ttest_ind(rvs1.reshape((100, 5)), rvs5, permutations=1000)
Ttest_indResult(statistic=array([ 0.01273325,  0.39392577,  0.20826074, 0.05052847,  1.1114823 ]), pvalue=array([ 0.98985343,  0.69405975,  0.83523946,  0.9597522 ,  0.26770861]))

>>> stats.ttest_ind(rvs1.reshape((100, 5)), rvs5, permutations=1000)
Ttest_indResult(statistic=array([ 0.01273325,  0.39392577,  0.20826074,  0.05052847,  1.1114823 ]), pvalue=array([ 0.98801199,  0.69430569,  0.84015984,  0.97002997,  0.27972028]))

Now, in the above scenario, it makes sense to tile the second argument rvs5 5 times so that each instance of rvs1 can be tested against rvs5 correct? This is what the new solution does

It wouldn't make sense if rvs5 had more than 2 columns - it fact it would break the parametric t-test

In other words

>>> from scipy import stats
>>> import numpy as np
>>> np.random.seed(0)
>>> rvs1 = stats.norm.rvs(loc=5, scale=10, size=500)
>>> rvs5 = stats.norm.rvs(loc=8, scale=20, size=200)
>>> np_t_stats, pvalues = stats.ttest_ind(rvs1.reshape((100, 5)), rvs5.reshape((100, 2)))
ValueError                                Traceback (most recent call last)
<ipython-input-6-cf6799185f54> in <module>()
----> 1 np_t_stats, pvalues = stats.ttest_ind(rvs1.reshape((100, 5)), rvs5.reshape((100, 2)))

/Users/mortonjt/.virtualenvs/scipy/lib/python2.7/site-packages/scipy/stats/stats.pyc in ttest_ind(a, b, axis, equal_var, permutations, random_state)
   3700 
   3701         if equal_var:
-> 3702             df, denom = _equal_var_ttest_denom(v1, n1, v2, n2)
   3703         else:
   3704             df, denom = _unequal_var_ttest_denom(v1, n1, v2, n2)

/Users/mortonjt/.virtualenvs/scipy/lib/python2.7/site-packages/scipy/stats/stats.pyc in _equal_var_ttest_denom(v1, n1, v2, n2)
   3383 def _equal_var_ttest_denom(v1, n1, v2, n2):
   3384     df = n1 + n2 - 2
-> 3385     svar = ((n1 - 1) * v1 + (n2 - 1) * v2) / float(df)
   3386     denom = np.sqrt(svar * (1.0 / n1 + 1.0 / n2))
   3387     return df, denom

ValueError: operands could not be broadcast together with shapes (5,) (2,)

Now, when I first saw your comment back in March, I jumped the gun. I was thinking that this permutation test would have to consider testing the following scenario

>>> import numpy as np
>>> from scipy import stats
>>> a = np.array([[1,2,3], [2,3]])
>>> b = np.array([[1,2], [2,3]])
>>> stats.ttest_ind(a,b)

TypeError                                 Traceback (most recent call last)
<ipython-input-15-4d314839bf04> in <module>()
----> 1 stats.ttest_ind(a,b)

/Users/mortonjt/.virtualenvs/scipy/lib/python2.7/site-packages/scipy/stats/stats.pyc in ttest_ind(a, b, axis, equal_var, permutations, random_state)
   3694 
   3695     else:
-> 3696         v1 = np.var(a, axis, ddof=1)
   3697         v2 = np.var(b, axis, ddof=1)
   3698         n1 = a.shape[axis]

/Users/mortonjt/.virtualenvs/scipy/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims)
   2936 
   2937     return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
-> 2938                                 keepdims=keepdims)

/Users/mortonjt/.virtualenvs/scipy/lib/python2.7/site-packages/numpy/core/_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims)
     93     if isinstance(arrmean, mu.ndarray):
     94         arrmean = um.true_divide(
---> 95                 arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
     96     else:
     97         arrmean = arrmean.dtype.type(arrmean / rcount)

TypeError: unsupported operand type(s) for /: 'list' and 'int'

But the parametric t-test will break in this scenario, since a is now an array of objects, rather than an array of ints. So I think this scenario can be disregarded, unless we want to revamp the parametric t-test.

Does this make sense?

STY: Refactoring uneven lengths implementation STY: Cleaning up some pep8 issues

mortonjt · 2015-05-22T00:52:07Z

@rgommers , I double checked the diffs - all of the changes that you committed should be here now.

I'm still a bit confused about the uneven lengths problem. Mind if you could take another look over this PR when you have time? Thanks!

ev-br · 2015-11-09T11:20:14Z

What's the status of this?

rgommers · 2015-11-30T21:19:07Z

@ev-br the status of this is that I'm being the bottleneck here and should look at the history in detail again. Realistically this is only going to happen after the branch. I'll set the milestone to 0.18.0 and self-assign it.

ev-br · 2016-06-19T11:45:20Z

I'm bumping the milestone. If it does get into 0.18.0, all the better :-).

sturlamolden · 2016-07-07T05:18:48Z

For the record, the performance problems with numpy.random.RandomState.shuffle has been fixed upstream. We do not any longer need special Cython shuffling code or workarounds to make fast permutation tests.

mortonjt · 2016-07-07T18:36:09Z

That's very good news! I'll try to redo the benchmarks - I'd be curious to
see how much of an improvement it will yield.
On Jul 6, 2016 10:18 PM, "Sturla Molden" notifications@github.com wrote:

For the record, the performance problems with
numpy.random.RandomState.shuffle has been fixed upstream. We do not any
longer need special Cython shuffling code or workarounds to make fast
permutation tests.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4824 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AD_a3VW6sbU6_x7JVol6qdCH9s3WDGNSks5qTIw_gaJpZM4ER-t0
.

jrubinstein · 2017-06-13T14:15:32Z

What's the status of this change? I'm looking for the equivalent of perm_ts() in R for Python. Has this been incorporated into scipy.stats? I tried to run this np_t_stats, pvalues = stats.ttest_ind(rvs1.reshape((100, 5)), rvs5, permutations=1000) , but got TypeError: ttest_ind() got an unexpected keyword argument 'permutations'

rgommers · 2017-06-14T11:41:52Z

@jrubinstein it hasn't been incorporated, due to lack of reviewer time (mine in particular). If you are interested, testing the code in this PR and providing feedback on that would be useful.

jrubinstein · 2017-06-14T13:48:36Z

@rgommers I would love to test this. Could probably do so today, even. But, alas, I have no clue how to install from a PR. I can see some instructions on installing from a branch, but that's as far as my google-fu takes me. Is there a good guide for installing from a PR?
Thanks!

rgommers · 2017-06-14T21:47:11Z

At the top of this page it says:

mortonjt wants to merge 2 commits into scipy:master from mortonjt:ttest

That branch is https://github.com/mortonjt/scipy/tree/ttest. So it'd be (in your local repo):

git remote add mortonjt https://github.com/mortonjt/scipy.git    
git fetch mortonjt
git co -b ttest mortonjt/ttest   # now you have a branch with the same content
# follow the normal instructions to build from source

hoechenberger · 2018-06-23T14:26:56Z

Any chances of this still getting merged? (after the merge conflicts are resolved) I'd be willing to help test the code.

rgommers · 2018-06-24T00:39:52Z

@hoechenberger yes, we could get this merged. Help with testing/reviewing would be much appreciated.

rlucas7

@mdhaber thanks for the change, it might be minor but more intuitive to read (for my eyes anyway). The codecov adds a lot of noise to the PR but is unrelated. AFAICS the PR is mostly ready (sans the _validate_int method). Once #12603 is merged we can use the function.

One thought though: given that PR (#12603) is in draft mode ATM, could we cherry-pick the code for the _validate_int method here, use it and then rebase #12603 off master?

If that seems more complicated then we can wait on this one to get merged, just proposing alternative approaches to see if they stick.

mdhaber · 2020-12-25T20:53:32Z

Actually @rlucas7 i don't want to use _validate_int anymore. It's too strict for my taste. I added my own input validation (one line) and added a test for it. So if it looks good to you, it's all set from my perspective.
.

mdhaber · 2021-01-01T23:43:16Z

scipy/stats/stats.py

+        indices = np.array([random_state.permutation(m) for i in range(n)]).T
+    else:
+        n = n_max
+        indices = np.array(list(permutations(range(m)))).T


I think it's OK to merge as is, but afterwards (at some point) I will submit a PR that makes this much more efficient. We don't need all of these permutations because the t-statistic doesn't care about the order of the numbers within the groups, just which group the numbers are in. Since I added the exact test while thinking about the permutation test, I didn't think that through. Anyway, this should give the same result; it's just that taking combinations is way more efficient.

rlucas7

@mdhaber Thanks for clarifying the change w.r.t. the _validate_int() function.

I took another pass at this tonight. It looks fine from my perspective. I'll leave it open for the rest of week and merge on Sunday unless further comments.

scipy/stats/stats.py

rlucas7 · 2021-01-11T03:11:40Z

@mdhaber was planning to merge tonight but will wait for the build to complete with this last tiny patch you suggested, I'll plan to merge tomorrow evening.

mdhaber · 2021-01-11T03:36:09Z

Ok, thanks! Anything you want me to take a look at?
If you're interested in other resurrected PRs, gh-4933 addresses about a dozen mannwhitneyu issues.

rlucas7 · 2021-01-11T12:18:33Z

@mdhaber was planning to merge tonight but will wait for the build to complete with this last tiny patch you suggested, I'll plan to merge tomorrow evening.

Actually, I'm up (much) earlier than usual and there weren't any surprises in the latest patch. So squashing and merging, thanks @mdhaber and @mortonjt for the PR and to @rgommers and @sturlamolden for the reviews.

I'll add something to release notes too.

rlucas7 · 2021-01-11T12:27:18Z

Ok, thanks! Anything you want me to take a look at?
If you're interested in other resurrected PRs, gh-4933 addresses about a dozen mannwhitneyu issues.

If I have some time next weekend I'll take a look-it would be good to finish up, or if necessary close out, some of the oldest PRs,

rgommers · 2021-01-11T14:09:46Z

This must be just about a record - just shy of 6 years to get this merged. Happy to finally see that happen! Thanks @mortonjt, and @rlucas7, @mdhaber and all other reviewers!

dbolser · 2024-10-02T06:42:42Z

How does the permutation flag compare to the 'permutation test proper' within scypy.stats? Are they the same under the hood? Sorry for naive questions 😅

mdhaber · 2024-10-02T14:52:55Z

It's just history - the permutation t-test was requested first, so it was added, then it was obvious that a general function would be useful.
I think the exact test (with small enough samples) will give the same results. Randomized tests might give sightly different results even with the same random state. The only conceptual difference I remember is the two-sided p-value. The permutation t-test, IIRC, counts the elements of the permutation distribution with absolute value greater than the observed statistic. permutation_test doubles the minimum of the two one-sided p-values. These are two (of the three) common conventions for two-sided p-values. The absolute value only works when the null distribution is expected to be symmetric, so it couldn't be used for the more general permutation_test. @steppi pointed out that the doubling of the smaller p-value can be interpreted as a Bonferroni correction when testing multiple hypotheses testing (it's equivalent to performing both one-sided tests and halving your threshold for significance).

dbolser · 2024-10-02T15:00:55Z

Thanks, that clarifies a bit. I was struggling to understand the difference between the sum and the mean permutation statistic when the cases and controls are not 50/50. Now it makes sense.

…

On Wed, 2 Oct 2024, 3:53 pm Matt Haberland, ***@***.***> wrote: It's just history - the permutation t-test was requested first, so it was added, then it was obvious that a general function would be useful. I think the exact test (with small enough samples) will give the same results. Randomized tests might give sightly different results even with the same randomized state. The main conceptual difference is the two-sided p-value. The permutation t-test, IIRC, counts the elements of the permutation distribution with absolute value greater than the observed statistic. permutation_test doubles the minimum of the two one-sided p-values. They are both common conventions for two-sided p-values. The absolute value only works when the null distribution is expected to be symmetric, so it couldn't be used for the more general permutation_test. @steppi <https://github.com/steppi> pointed out that the doubling of the smaller p-value can be interpreted as a Bonferroni correction when testing multiple hypotheses testing (it's equivalent to performing both one-sided tests and halving your threshold for significance). — Reply to this email directly, view it on GitHub <#4824 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA7NE7MF5GUYWZLM7CJW5LZZQCGBAVCNFSM6AAAAABPHAXA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBYHA3DSMRSGQ> . You are receiving this because you commented.Message ID: ***@***.***>

ENH: Adding in permutation test option in ttest_ind

effb6af

argriffing added scipy.stats enhancement A new feature or improvement labels May 7, 2015

mortonjt force-pushed the ttest branch from bbd5b3a to ac7431d Compare May 7, 2015 21:22

DOC: Updating permutation test documentation and contributions

b5e6850

STY: Refactoring uneven lengths implementation STY: Cleaning up some pep8 issues

mortonjt force-pushed the ttest branch from e04a104 to b5e6850 Compare May 22, 2015 00:51

ev-br added this to the 0.17.0 milestone Nov 16, 2015

rgommers modified the milestones: 0.18.0, 0.17.0 Nov 30, 2015

rgommers self-assigned this Nov 30, 2015

ev-br modified the milestones: 0.19.0, 0.18.0 Jun 19, 2016

sturlamolden mentioned this pull request Jul 7, 2016

Incorporating in fast permutation tests #4399

Closed

ev-br modified the milestones: 1.0, 0.19.0 Feb 12, 2017

rgommers removed this from the 1.0.0 milestone Sep 9, 2017

mdhaber added 3 commits December 20, 2020 18:31

MAINT: stats: input validation for permutation t-test

d2eea1e

MAINT: stats: simplify _broadcast_concatenate function

21ab4ee

TST: stats: improve test for vectorized t-test

992a7ca

mdhaber mentioned this pull request Dec 23, 2020

ENH: add stats.qmc module with quasi Monte Carlo functionality #10844

Merged

MAINT: stats: adjust for clarity per @rlucas7 suggestion

e0432e5

rlucas7 reviewed Dec 25, 2020

View reviewed changes

mdhaber reviewed Jan 1, 2021

View reviewed changes

rlucas7 approved these changes Jan 6, 2021

View reviewed changes

mdhaber reviewed Jan 7, 2021

View reviewed changes

scipy/stats/stats.py Show resolved Hide resolved

rlucas7 reviewed Jan 11, 2021

View reviewed changes

scipy/stats/stats.py Outdated Show resolved Hide resolved

Apply suggestions from code review

5599489

rlucas7 merged commit 255d49c into scipy:master Jan 11, 2021

rgommers added this to the 1.7.0 milestone Jan 11, 2021

mdhaber mentioned this pull request Jan 13, 2021

TST: stats: mark many dimension permutation t-test slow #13383

Merged

mdhaber mentioned this pull request Mar 9, 2021

ENH: stats: improve efficiency of / fix bug in exact permutation t-test #13661

Merged

mdhaber mentioned this pull request Apr 12, 2021

DOC: Roadmap update #13849

Merged

mdhaber mentioned this pull request Apr 19, 2021

ENH: stats: add general permutation hypothesis test #13899

Merged

mdhaber mentioned this pull request Jun 8, 2021

MAINT: stats: batch permutation t-test #14206

Closed

mdhaber mentioned this pull request Dec 14, 2021

MAINT: stats: batch permutation t-test #15215

Merged

mdhaber mentioned this pull request Sep 16, 2022

ENH: stats.permutation_test: improve performance of samples/pairings random permutations #17030

Merged

mdhaber mentioned this pull request Nov 21, 2024

DEP: stats.ttest_ind: deprecate random_state and permutation in favor of method; require keywords for most args #21921

Merged

Uh oh!

Permutation Ttest (new PR) #4824

Permutation Ttest (new PR) #4824

Uh oh!

Conversation

mortonjt commented May 7, 2015

Uh oh!

rgommers commented May 7, 2015

Uh oh!

mortonjt commented May 7, 2015

Uh oh!

mortonjt commented May 22, 2015

Uh oh!

ev-br commented Nov 9, 2015

Uh oh!

rgommers commented Nov 30, 2015

Uh oh!

ev-br commented Jun 19, 2016

Uh oh!

sturlamolden commented Jul 7, 2016

Uh oh!

mortonjt commented Jul 7, 2016

Uh oh!

jrubinstein commented Jun 13, 2017

Uh oh!

rgommers commented Jun 14, 2017

Uh oh!

jrubinstein commented Jun 14, 2017

Uh oh!

rgommers commented Jun 14, 2017

Uh oh!

hoechenberger commented Jun 23, 2018

Uh oh!

rgommers commented Jun 24, 2018

Uh oh!

rlucas7 left a comment

Choose a reason for hiding this comment

Uh oh!

mdhaber commented Dec 25, 2020

Uh oh!

mdhaber Jan 1, 2021

Choose a reason for hiding this comment

Uh oh!

rlucas7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rlucas7 commented Jan 11, 2021

Uh oh!

mdhaber commented Jan 11, 2021

Uh oh!

rlucas7 commented Jan 11, 2021

Uh oh!

rlucas7 commented Jan 11, 2021

Uh oh!

rgommers commented Jan 11, 2021

Uh oh!

dbolser commented Oct 2, 2024

Uh oh!

mdhaber commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dbolser commented Oct 2, 2024 via email

Uh oh!

Uh oh!

mdhaber commented Oct 2, 2024 •

edited

Loading