Skip to content

Conversation

WarrenWeckesser
Copy link
Member

@WarrenWeckesser WarrenWeckesser commented Jul 22, 2020

Add a new function, binomtest, similar to binom_test but that returns an object with both the estimated proportion and the p-value as attributes, and that has a method to compute the confidence interval of the estimated proportion.

Also "doc-deprecate" binom_test.

@WarrenWeckesser WarrenWeckesser marked this pull request as draft July 22, 2020 20:55
@WarrenWeckesser
Copy link
Member Author

binom_test in master returns a single floating point value, so we can't replace it with the bunch-like object that we recently implemented. So we have to figure out an API that allows the addition of the confidence interval calculation to the binomial test.

Instead of trying to modify the return value of binom_test, we could create a new function, say binomial_test, that returns an object with whatever attributes we want it to have. This is probably the simplest approach. It lets us implement the API of binomial_test without having to deal with backwards compatibility.

Alternatively, we could do what I've implemented in this PR, which is to change the return value of binom_test based on whether the argument confidence_level is None or not. (I'm sure the folks working on typing will love that!) If confidence_level is None (the default), the return value is just the p-value, for backwards compatibility. Otherwise, an instance of BinomTestResult is returned.

I'm not sure about this API. Ultimately, it might be better to create the new function binomial_test, and "doc-deprecate" binom_test. Then binomial_test could have the same default for confidence_interval (0.95) that we'll probably use in the other statistical test functions.

@josef-pkt
Copy link
Member

There is no real code sharing between binom test and confint.

Alternative is to add confint as separate function (that's what I'm doing in statsmodels for now. All in one functions and classes are currently only for a few special cases.)

@WarrenWeckesser WarrenWeckesser added the enhancement A new feature or improvement label Jul 23, 2020
@mdhaber
Copy link
Contributor

mdhaber commented Jul 24, 2020

Making a new function to avoid tip-toeing around the old API makes sense to me.
Consider also binomtest as the name, without the underscore. The lack of underscore would be more consistent with the names of other tests (skewtest, kurtosistest, and normaltest), and binom is consistent with the abbreviation used elsewhere.

@WarrenWeckesser WarrenWeckesser force-pushed the binom-conf-int branch 4 times, most recently from cc77ea9 to 2a06a31 Compare October 26, 2020 07:06
@WarrenWeckesser WarrenWeckesser marked this pull request as ready for review October 26, 2020 09:43
@WarrenWeckesser WarrenWeckesser changed the title WIP/ENH: stats: Add calculation of confidence interval for binom_test. ENH: stats: Add calculation of confidence interval for binom_test. Oct 26, 2020
The new function, binomtest, returns an object with more information, and
with a method to compute the confidence interval for the estimated proportion.
The old version, binom_test, is deprecated.
@WarrenWeckesser
Copy link
Member Author

I didn't comment when I updated this a couple weeks ago. I followed @mdhaber's suggestion to create a new function called binomtest.

The new function returns an object with more information, and with a method to compute the confidence interval for the estimated proportion. The old version, binom_test, is deprecated.

E.g.

In [63]: result = binomtest(4, n=31, p=0.25)                                                                                                          

In [64]: result                                                                                                                                       
Out[64]: BinomTestResult(k=4, n=31, alternative='two-sided', proportion_estimate=0.12903225806451613, pvalue=0.1471902718162708)

In [65]: result.pvalue                                                                                                                                
Out[65]: 0.1471902718162708

In [66]: result.proportion_estimate                                                                                                                   
Out[66]: 0.12903225806451613

In [67]: result.proportion_ci()                                                                                                                       
Out[67]: ConfidenceInterval(low=0.03630166197920805, high=0.29833582900779726)

The API probably needs a few tweaks, e.g. @mdhaber suggested using pi instead of p, to avoid confusion with the p in pvalue.

Copy link
Member

@rlucas7 rlucas7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me, I'll merge tomorrow unless @mdhaber merges beforehand.

@@ -2441,6 +2441,8 @@ def levene(*args, center='median', proportiontocut=0.05):
return LeveneResult(W, pval)


@np.deprecate(new_name='binomtest',
message='`binom_test` will be removed from SciPy 1.8.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rlucas7 @WarrenWeckesser Should this still be 1.8, given that this didn't make it into 1.6?
Also, did we send a message to the mailing list? I'm just thinking that we should do that before putting this in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdhaber, I'll change that to 1.9. And I agree that this should be brought up on the mailing list before we merge this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I'll hold off merging then :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code was updated to say 1.9, and I just sent an email about this to the mailing list.

@WarrenWeckesser WarrenWeckesser marked this pull request as draft December 13, 2020 14:17
* Add `versionadded` markup to the Notes section of `binomtest`.
* Bump removal version of `binom_test`.
* Copyedit `_validate_int` docstring.
@rlucas7 rlucas7 added this to the 1.7.0 milestone Dec 13, 2020
@rlucas7
Copy link
Member

rlucas7 commented Dec 13, 2020

@mdhaber I put a 1.7.0 milestone on this so that we don't forget

@mdhaber
Copy link
Contributor

mdhaber commented Dec 13, 2020

@mdhaber I put a 1.7.0 milestone on this so that we don't forget

I've been told to wait until merge.

@WarrenWeckesser
Copy link
Member Author

As I noted in the mailing list, over in gh-12323, Ralf suggested that we just "doc-deprecate" binom_test. In my most recent commit I removed the deprecation of binom_test, and added a line to the docstring saying it is deprecated.

@WarrenWeckesser WarrenWeckesser changed the title ENH: stats: Add calculation of confidence interval for binom_test. ENH: stats: Add binomtest to replace binom_test. Dec 13, 2020
Comment on lines 253 to 255
def test_validate_int(self):
n = _validate_int(4, 'n')
assert n == 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pseudo expected operator.index(Fraction(2, 1)) to work maybe also test operator.index(np.array(1))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test looks at the type of the object, not the value. In general, not all Fraction objects are integers, just like not all float objects are integers. I added a few more tests, including one to test that a scalar array is acceptable, and another that a Fraction is not (even if the value of the Fraction is actually an integer).

assert n == 4

def test_validate_int_bad1(self):
@pytest.mark.parametrize('n', [4.0, np.array([4]), Fraction(4, 1)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I didn't notice that it was this strict. Some SciPy functions accept everything under the sun; this seems like a bit of a departure from the norm. Thoughts on why converting to an int internally and checking for equality with the original is not good enough? (This is Python, right?)

@mdhaber
Copy link
Contributor

mdhaber commented Dec 31, 2020

@WarrenWeckesser Time to merge this, no?

@WarrenWeckesser WarrenWeckesser marked this pull request as ready for review January 4, 2021 03:10
@WarrenWeckesser
Copy link
Member Author

I've moved this from "draft" to "ready to review". The one test failure on TravisCI is the old test_symmetric_modes ARPACK failure, and is not related to this PR.

I sent an email to the mailing list about this PR on Dec. 13, which triggered no discussion (and therefore no objections!). So I think this is ready.

@mdhaber mdhaber merged commit 8f38281 into scipy:master Jan 4, 2021
@mdhaber
Copy link
Contributor

mdhaber commented Jan 4, 2021

Thanks @WarrenWeckesser, @rlucas7!

@WarrenWeckesser WarrenWeckesser deleted the binom-conf-int branch January 4, 2021 04:52
@WarrenWeckesser
Copy link
Member Author

Thanks all. I added a note to the SciPy 1.7.0 release notes on the wiki.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants