-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
ENH: stats.nct: replace with boost implementation #16591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Matt! Just need to clarify something. (Clicked approved instead of comment, but does not really matter)
scipy/stats/_continuous_distns.py
Outdated
/ np.asarray(np.sqrt(fac1)*sc.gamma(n/2+1))) | ||
Px *= trm1+trm2 | ||
return Px | ||
return np.clip(_boost._nct_pdf(x, df, nc), 0, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing something? I thought you were saying we should keep SciPy's version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am. I'm going to remove the changes to pdf except for the clip
. I didn't decide that until after I pushed the code and was writing the issue, and I didn't change it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed @tupui.
@mckib2 can you check whether the Boost's noncentral t For example, according to Mathematica (and SciPy main), the excess kurtosis of NCT with 10 degrees of freedom and noncentrality parameter 3 is 2.44. The kurtosis (3 + excess kurtosis) is 5.44. See But |
What I see right away in the Boost source is that If everything is 3 too big, then you can try just subtracting 3 from the result? I'm not sure if they're using a different definition of kurtosis or not. I'll write a quick example now and report back what the results are |
My guess is that they read a formula for (not-excess/Pearson) kurtosis and implemented it in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Merging, thanks @mdhaber
@mckib2 The test case I'd like to show is df = 8, nc = 8.26915191978. Here are the results from Wolfram Alpha evaluated at And the plots: PDF[NoncentralStudentTDistribution[8, 8.26915191978]] Fortunately it was nice enough to zoom in to the region of interest between -1 and 1: Compare against the PDF plot from Boost above, which has some weird kinks. |
Documenting for posterity: removing the |
@@ -66,6 +66,15 @@ ncx2_ufunc = py3.extension_module('ncx2_ufunc', | |||
subdir: 'scipy/stats/_boost' | |||
) | |||
|
|||
ncf_ufunc = py3.extension_module('nct_ufunc', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mckib2 Just found a typo here. ncf_ufunc
didn't get replaced with nct_ufunc
. Are these names important? It doesn't appear to have made our ncf
distribution the same as our nct
distribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meson will use that name for the name of the build target, I don't believe it will be used for anything else. It just needs to be unique, so as long as nothing else in the project is named "nct_ufunc", then we're fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nct_ufunc is a few lines above. Please take a look to see if you would have expected it to cause a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mckib2 I wasn't sure whether this was understood - the question is more about the name of the variable ncf_ufunc
than the build target name string 'nct_ufunc'
. Here, we accidentally redefined what the variable ncf_ufunc
refers to - now it refers to the extension for NCT instead of NCF.
It doesn't look like the variable names are used elsewhere in this module, but perhaps they are imported elsewhere? Does this affect anything? Do we even need the variables to have names at all?
The distributions seem to be working correctly, but I thought I should double-check as we prepare for 1.9.1.
@mborland we noticed here that Boost's non-central t PDF has some unexpected behavior in the left tail. Is the SciPy code in #19348 (comment) enough for you to go on, or should I find someone who can open the issue in C++? Thanks for taking a look. |
That python snippet should be enough to reproduce. I'll report back if it's not. |
Reference issue
Closes gh-7104 (very few distributions are accurate to machine precision in all cases, so no need to track this enhancement request)
What does this implement/fix?
This replaces SciPy's implementation of the noncentral t distribution with the one from Boost. It also clips PDF below (at 0) and CDF/SF between 0 and 1 to ensure that the negative values observed in gh-7104 don't occur.
I don't have a reference implementation of the CDF to compare against (Wolfram Alpha times out), but one advantage is that it smoothes out the CDF near 0, at least for the parameters in gh-7104. In this PR:

vs main:

Additional information
We should probably leave in SciPy's PDF for now, and maybe file an issue with Boost. Boost's PDF doesn't look quite as good.

vs main:

and checking out the numerical value at
x = 1
, Wolfram Alpha gives8.51798811×10^-12
. SciPy's implementation agrees; Boost's is half that. It looks like they are stitching together two implementations for the PDF near there, so the transition is not very smooth or accurate.