-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Permutation Ttest (new PR) #4824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mortonjt you lost something along the way it looks like. There's missing blank lines in the docstring that were fixed in the other PR, and I'm not sure you have the fix for unequal length inputs here. Do you have your other branch still with the fixes you made on top of my branch? If you point me too it I'll try to clean things up. |
I added in the white spaces again - I didn't notice these changes getting overwritten when I merged your branch.... Perhaps I'm not completely understanding the unequal lengths problem. In the last PR, you pointed out the following scenario
Now, in the above scenario, it makes sense to tile the second argument It wouldn't make sense if In other words
Now, when I first saw your comment back in March, I jumped the gun. I was thinking that this permutation test would have to consider testing the following scenario
But the parametric t-test will break in this scenario, since Does this make sense? |
STY: Refactoring uneven lengths implementation STY: Cleaning up some pep8 issues
@rgommers , I double checked the diffs - all of the changes that you committed should be here now. I'm still a bit confused about the uneven lengths problem. Mind if you could take another look over this PR when you have time? Thanks! |
What's the status of this? |
@ev-br the status of this is that I'm being the bottleneck here and should look at the history in detail again. Realistically this is only going to happen after the branch. I'll set the milestone to 0.18.0 and self-assign it. |
I'm bumping the milestone. If it does get into 0.18.0, all the better :-). |
For the record, the performance problems with |
That's very good news! I'll try to redo the benchmarks - I'd be curious to
|
What's the status of this change? I'm looking for the equivalent of perm_ts() in R for Python. Has this been incorporated into scipy.stats? I tried to run this |
@jrubinstein it hasn't been incorporated, due to lack of reviewer time (mine in particular). If you are interested, testing the code in this PR and providing feedback on that would be useful. |
@rgommers I would love to test this. Could probably do so today, even. But, alas, I have no clue how to install from a PR. I can see some instructions on installing from a branch, but that's as far as my google-fu takes me. Is there a good guide for installing from a PR? |
At the top of this page it says:
That branch is https://github.com/mortonjt/scipy/tree/ttest. So it'd be (in your local repo):
|
Any chances of this still getting merged? (after the merge conflicts are resolved) I'd be willing to help test the code. |
@hoechenberger yes, we could get this merged. Help with testing/reviewing would be much appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdhaber thanks for the change, it might be minor but more intuitive to read (for my eyes anyway). The codecov adds a lot of noise to the PR but is unrelated. AFAICS the PR is mostly ready (sans the _validate_int
method). Once #12603 is merged we can use the function.
One thought though: given that PR (#12603) is in draft mode ATM, could we cherry-pick the code for the _validate_int
method here, use it and then rebase #12603 off master?
If that seems more complicated then we can wait on this one to get merged, just proposing alternative approaches to see if they stick.
Actually @rlucas7 i don't want to use _validate_int anymore. It's too strict for my taste. I added my own input validation (one line) and added a test for it. So if it looks good to you, it's all set from my perspective. |
indices = np.array([random_state.permutation(m) for i in range(n)]).T | ||
else: | ||
n = n_max | ||
indices = np.array(list(permutations(range(m)))).T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to merge as is, but afterwards (at some point) I will submit a PR that makes this much more efficient. We don't need all of these permutations because the t-statistic doesn't care about the order of the numbers within the groups, just which group the numbers are in. Since I added the exact test while thinking about the permutation test, I didn't think that through. Anyway, this should give the same result; it's just that taking combinations is way more efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdhaber Thanks for clarifying the change w.r.t. the _validate_int()
function.
I took another pass at this tonight. It looks fine from my perspective. I'll leave it open for the rest of week and merge on Sunday unless further comments.
@mdhaber was planning to merge tonight but will wait for the build to complete with this last tiny patch you suggested, I'll plan to merge tomorrow evening. |
Ok, thanks! Anything you want me to take a look at? |
Actually, I'm up (much) earlier than usual and there weren't any surprises in the latest patch. So squashing and merging, thanks @mdhaber and @mortonjt for the PR and to @rgommers and @sturlamolden for the reviews. I'll add something to release notes too. |
If I have some time next weekend I'll take a look-it would be good to finish up, or if necessary close out, some of the oldest PRs, |
How does the permutation flag compare to the 'permutation test proper' within scypy.stats? Are they the same under the hood? Sorry for naive questions 😅 |
It's just history - the permutation t-test was requested first, so it was added, then it was obvious that a general function would be useful. |
Thanks, that clarifies a bit. I was struggling to understand the difference
between the sum and the mean permutation statistic when the cases and
controls are not 50/50. Now it makes sense.
…On Wed, 2 Oct 2024, 3:53 pm Matt Haberland, ***@***.***> wrote:
It's just history - the permutation t-test was requested first, so it was
added, then it was obvious that a general function would be useful.
I think the exact test (with small enough samples) will give the same
results. Randomized tests might give sightly different results even with
the same randomized state. The main conceptual difference is the two-sided
p-value. The permutation t-test, IIRC, counts the elements of the
permutation distribution with absolute value greater than the observed
statistic. permutation_test doubles the minimum of the two one-sided
p-values. They are both common conventions for two-sided p-values. The
absolute value only works when the null distribution is expected to be
symmetric, so it couldn't be used for the more general permutation_test.
@steppi <https://github.com/steppi> pointed out that the doubling of the
smaller p-value can be interpreted as a Bonferroni correction when testing
multiple hypotheses testing (it's equivalent to performing both one-sided
tests and halving your threshold for significance).
—
Reply to this email directly, view it on GitHub
<#4824 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA7NE7MF5GUYWZLM7CJW5LZZQCGBAVCNFSM6AAAAABPHAXA3OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBYHA3DSMRSGQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
This is a new PR continuing from #4440