Skip to content

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Jul 13, 2022

Reference issue

Closes gh-7104 (very few distributions are accurate to machine precision in all cases, so no need to track this enhancement request)

What does this implement/fix?

This replaces SciPy's implementation of the noncentral t distribution with the one from Boost. It also clips PDF below (at 0) and CDF/SF between 0 and 1 to ensure that the negative values observed in gh-7104 don't occur.

I don't have a reference implementation of the CDF to compare against (Wolfram Alpha times out), but one advantage is that it smoothes out the CDF near 0, at least for the parameters in gh-7104. In this PR:
image

vs main:
image

Additional information

We should probably leave in SciPy's PDF for now, and maybe file an issue with Boost. Boost's PDF doesn't look quite as good.
image

vs main:
image

and checking out the numerical value at x = 1, Wolfram Alpha gives 8.51798811×10^-12. SciPy's implementation agrees; Boost's is half that. It looks like they are stitching together two implementations for the PDF near there, so the transition is not very smooth or accurate.

@mdhaber mdhaber added scipy.stats enhancement A new feature or improvement labels Jul 13, 2022
@mdhaber mdhaber requested a review from mckib2 July 13, 2022 05:17
Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt! Just need to clarify something. (Clicked approved instead of comment, but does not really matter)

/ np.asarray(np.sqrt(fac1)*sc.gamma(n/2+1)))
Px *= trm1+trm2
return Px
return np.clip(_boost._nct_pdf(x, df, nc), 0, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something? I thought you were saying we should keep SciPy's version.

Copy link
Contributor Author

@mdhaber mdhaber Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am. I'm going to remove the changes to pdf except for the clip. I didn't decide that until after I pushed the code and was writing the issue, and I didn't change it yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed @tupui.

@mdhaber
Copy link
Contributor Author

mdhaber commented Jul 14, 2022

@mckib2 can you check whether the Boost's noncentral t kurtosis and kurtosis_excess return the same thing?

For example, according to Mathematica (and SciPy main), the excess kurtosis of NCT with 10 degrees of freedom and noncentrality parameter 3 is 2.44. The kurtosis (3 + excess kurtosis) is 5.44.

See ExcessKurtosis[NoncentralStudentTDistribution[10, 3]] vs Kurtosis[NoncentralStudentTDistribution[10, 3]].

But test_moment in this PR is failing because it seems like _boost._nct_kurtosis_excess is 3 too big.

@mckib2
Copy link
Contributor

mckib2 commented Jul 14, 2022

But test_moment in this PR is failing because it seems like _boost._nct_kurtosis_excess is 3 too big.

What I see right away in the Boost source is that kurtosis is returning kurtosis_excess + 3:

image

If everything is 3 too big, then you can try just subtracting 3 from the result? I'm not sure if they're using a different definition of kurtosis or not.

I'll write a quick example now and report back what the results are

@mdhaber
Copy link
Contributor Author

mdhaber commented Jul 14, 2022

My guess is that they read a formula for (not-excess/Pearson) kurtosis and implemented it in kurtosis_excess (thinking it was excess/Fisher kurtosis).
And yes, I subtracted 3, hence the passing tests! Looks like this got @tupui's approval, but could you double-check that I followed your example correctly and merge if it looks good?

Copy link
Contributor

@mckib2 mckib2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Merging, thanks @mdhaber

@mckib2 mckib2 merged commit 698cfe1 into scipy:main Jul 14, 2022
@tylerjereddy tylerjereddy added this to the 1.10.0 milestone Jul 14, 2022
@mdhaber
Copy link
Contributor Author

mdhaber commented Jul 14, 2022

@mckib2 The test case I'd like to show is df = 8, nc = 8.26915191978. Here are the results from Wolfram Alpha evaluated at x=1:
PDF[NoncentralStudentTDistribution[8, 8.26915191978], 1] 8.51798811×10^-12.
Boost's value was half that.

And the plots: PDF[NoncentralStudentTDistribution[8, 8.26915191978]]
image

Fortunately it was nice enough to zoom in to the region of interest between -1 and 1:
image

Compare against the PDF plot from Boost above, which has some weird kinks.

@mckib2
Copy link
Contributor

mckib2 commented Jul 16, 2022

Documenting for posterity: removing the -DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false (i.e., allowing double -> long double promotion) corrects the pdf(1.0) issue, but the "horn" in the PDF still exists in the Cython-wrapped NCT distribution and not in any C++ Boost test I have yet devised

@@ -66,6 +66,15 @@ ncx2_ufunc = py3.extension_module('ncx2_ufunc',
subdir: 'scipy/stats/_boost'
)

ncf_ufunc = py3.extension_module('nct_ufunc',
Copy link
Contributor Author

@mdhaber mdhaber Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mckib2 Just found a typo here. ncf_ufunc didn't get replaced with nct_ufunc. Are these names important? It doesn't appear to have made our ncf distribution the same as our nct distribution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meson will use that name for the name of the build target, I don't believe it will be used for anything else. It just needs to be unique, so as long as nothing else in the project is named "nct_ufunc", then we're fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nct_ufunc is a few lines above. Please take a look to see if you would have expected it to cause a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mckib2 I wasn't sure whether this was understood - the question is more about the name of the variable ncf_ufunc than the build target name string 'nct_ufunc'. Here, we accidentally redefined what the variable ncf_ufunc refers to - now it refers to the extension for NCT instead of NCF.

It doesn't look like the variable names are used elsewhere in this module, but perhaps they are imported elsewhere? Does this affect anything? Do we even need the variables to have names at all?

The distributions seem to be working correctly, but I thought I should double-check as we prepare for 1.9.1.

@mdhaber
Copy link
Contributor Author

mdhaber commented Oct 6, 2023

@mborland we noticed here that Boost's non-central t PDF has some unexpected behavior in the left tail. Is the SciPy code in #19348 (comment) enough for you to go on, or should I find someone who can open the issue in C++? Thanks for taking a look.

@mborland
Copy link
Contributor

mborland commented Oct 6, 2023

@mborland we noticed here that Boost's non-central t PDF has some unexpected behavior in the left tail. Is the SciPy code in #19348 (comment) enough for you to go on, or should I find someone who can open the issue in C++? Thanks for taking a look.

That python snippet should be enough to reproduce. I'll report back if it's not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scipy.stats.nct - wrong values in tails
5 participants