ENH: stats.nct: replace with boost implementation #16591

mdhaber · 2022-07-13T05:17:31Z

Reference issue

Closes gh-7104 (very few distributions are accurate to machine precision in all cases, so no need to track this enhancement request)

What does this implement/fix?

This replaces SciPy's implementation of the noncentral t distribution with the one from Boost. It also clips PDF below (at 0) and CDF/SF between 0 and 1 to ensure that the negative values observed in gh-7104 don't occur.

I don't have a reference implementation of the CDF to compare against (Wolfram Alpha times out), but one advantage is that it smoothes out the CDF near 0, at least for the parameters in gh-7104. In this PR:

vs main:

Additional information

We should probably leave in SciPy's PDF for now, and maybe file an issue with Boost. Boost's PDF doesn't look quite as good.

vs main:

and checking out the numerical value at x = 1, Wolfram Alpha gives 8.51798811×10^-12. SciPy's implementation agrees; Boost's is half that. It looks like they are stitching together two implementations for the PDF near there, so the transition is not very smooth or accurate.

tupui

Thanks Matt! Just need to clarify something. (Clicked approved instead of comment, but does not really matter)

tupui · 2022-07-13T07:36:59Z

scipy/stats/_continuous_distns.py

-                / np.asarray(np.sqrt(fac1)*sc.gamma(n/2+1)))
-        Px *= trm1+trm2
-        return Px
+        return np.clip(_boost._nct_pdf(x, df, nc), 0, None)


Am I missing something? I thought you were saying we should keep SciPy's version.

I am. I'm going to remove the changes to pdf except for the clip. I didn't decide that until after I pushed the code and was writing the issue, and I didn't change it yet.

Ok thanks 👍

Should be fixed @tupui.

mdhaber · 2022-07-14T05:11:31Z

@mckib2 can you check whether the Boost's noncentral t kurtosis and kurtosis_excess return the same thing?

For example, according to Mathematica (and SciPy main), the excess kurtosis of NCT with 10 degrees of freedom and noncentrality parameter 3 is 2.44. The kurtosis (3 + excess kurtosis) is 5.44.

See ExcessKurtosis[NoncentralStudentTDistribution[10, 3]] vs Kurtosis[NoncentralStudentTDistribution[10, 3]].

But test_moment in this PR is failing because it seems like _boost._nct_kurtosis_excess is 3 too big.

mckib2 · 2022-07-14T16:14:53Z

But test_moment in this PR is failing because it seems like _boost._nct_kurtosis_excess is 3 too big.

What I see right away in the Boost source is that kurtosis is returning kurtosis_excess + 3:

If everything is 3 too big, then you can try just subtracting 3 from the result? I'm not sure if they're using a different definition of kurtosis or not.

I'll write a quick example now and report back what the results are

mdhaber · 2022-07-14T16:25:56Z

My guess is that they read a formula for (not-excess/Pearson) kurtosis and implemented it in kurtosis_excess (thinking it was excess/Fisher kurtosis).
And yes, I subtracted 3, hence the passing tests! Looks like this got @tupui's approval, but could you double-check that I followed your example correctly and merge if it looks good?

mckib2

LGTM! Merging, thanks @mdhaber

mdhaber · 2022-07-14T23:47:52Z

@mckib2 The test case I'd like to show is df = 8, nc = 8.26915191978. Here are the results from Wolfram Alpha evaluated at x=1:
PDF[NoncentralStudentTDistribution[8, 8.26915191978], 1] 8.51798811×10^-12.
Boost's value was half that.

And the plots: PDF[NoncentralStudentTDistribution[8, 8.26915191978]]

Fortunately it was nice enough to zoom in to the region of interest between -1 and 1:

Compare against the PDF plot from Boost above, which has some weird kinks.

mckib2 · 2022-07-16T03:21:08Z

Documenting for posterity: removing the -DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false (i.e., allowing double -> long double promotion) corrects the pdf(1.0) issue, but the "horn" in the PDF still exists in the Cython-wrapped NCT distribution and not in any C++ Boost test I have yet devised

mdhaber · 2022-08-03T19:03:41Z

scipy/stats/_boost/meson.build

@@ -66,6 +66,15 @@ ncx2_ufunc = py3.extension_module('ncx2_ufunc',
  subdir: 'scipy/stats/_boost'
 )

+ncf_ufunc = py3.extension_module('nct_ufunc',


@mckib2 Just found a typo here. ncf_ufunc didn't get replaced with nct_ufunc. Are these names important? It doesn't appear to have made our ncf distribution the same as our nct distribution.

meson will use that name for the name of the build target, I don't believe it will be used for anything else. It just needs to be unique, so as long as nothing else in the project is named "nct_ufunc", then we're fine

nct_ufunc is a few lines above. Please take a look to see if you would have expected it to cause a problem.

@mckib2 I wasn't sure whether this was understood - the question is more about the name of the variable ncf_ufunc than the build target name string 'nct_ufunc'. Here, we accidentally redefined what the variable ncf_ufunc refers to - now it refers to the extension for NCT instead of NCF.

It doesn't look like the variable names are used elsewhere in this module, but perhaps they are imported elsewhere? Does this affect anything? Do we even need the variables to have names at all?

The distributions seem to be working correctly, but I thought I should double-check as we prepare for 1.9.1.

mdhaber · 2023-10-06T02:10:47Z

@mborland we noticed here that Boost's non-central t PDF has some unexpected behavior in the left tail. Is the SciPy code in #19348 (comment) enough for you to go on, or should I find someone who can open the issue in C++? Thanks for taking a look.

mborland · 2023-10-06T12:57:23Z

@mborland we noticed here that Boost's non-central t PDF has some unexpected behavior in the left tail. Is the SciPy code in #19348 (comment) enough for you to go on, or should I find someone who can open the issue in C++? Thanks for taking a look.

That python snippet should be enough to reproduce. I'll report back if it's not.

ENH: stats.nct: replace with boost implementation

cf243b2

mdhaber added scipy.stats enhancement A new feature or improvement labels Jul 13, 2022

mdhaber requested a review from mckib2 July 13, 2022 05:17

Merge branch 'main' into gh7104

7d81181

tupui approved these changes Jul 13, 2022

View reviewed changes

MAINT: stats.nct: revert to SciPy PDF; fix excess kurtosis

ee43fb4

mckib2 approved these changes Jul 14, 2022

View reviewed changes

mckib2 merged commit 698cfe1 into scipy:main Jul 14, 2022

tylerjereddy added this to the 1.10.0 milestone Jul 14, 2022

mckib2 mentioned this pull request Jul 14, 2022

non_central_t distribution reports kurtosis + 3 boostorg/math#800

Closed

mdhaber commented Aug 3, 2022

View reviewed changes

mckib2 mentioned this pull request Nov 27, 2022

MAINT: stats: Unconditionally disable boost double promotion. #17482

Merged

josef-pkt mentioned this pull request Feb 8, 2023

ENH/BUG: add FTestPowerF2 as corrected version of FTestPower statsmodels/statsmodels#8656

Merged

mdhaber mentioned this pull request Oct 5, 2023

BUG: stats.nct.pdf inconsistent behavior when compared to MATLAB and R #19348

Closed

mdhaber mentioned this pull request May 11, 2024

BUG: stats.noncentral_t: incorrect pdf values #20693

Closed

nickodell mentioned this pull request May 22, 2024

BUG: ImportError: DLL load failed while importing nct_ufunc #20758

Closed

Carreau mentioned this pull request Jul 16, 2024

BUG: special.nctdtr: incorrect results #8344

Closed

Uh oh!

ENH: stats.nct: replace with boost implementation #16591

ENH: stats.nct: replace with boost implementation #16591

Uh oh!

Conversation

mdhaber commented Jul 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Additional information

Uh oh!

tupui left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tupui Jul 13, 2022

Choose a reason for hiding this comment

Uh oh!

mdhaber Jul 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tupui Jul 13, 2022

Choose a reason for hiding this comment

Uh oh!

mdhaber Jul 14, 2022

Choose a reason for hiding this comment

Uh oh!

mdhaber commented Jul 14, 2022

Uh oh!

mckib2 commented Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdhaber commented Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mckib2 left a comment

Choose a reason for hiding this comment

Uh oh!

mdhaber commented Jul 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mckib2 commented Jul 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdhaber Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mckib2 Aug 10, 2022

Choose a reason for hiding this comment

Uh oh!

mdhaber Aug 10, 2022

Choose a reason for hiding this comment

Uh oh!

mdhaber Aug 26, 2022

Choose a reason for hiding this comment

Uh oh!

mdhaber commented Oct 6, 2023

Uh oh!

mborland commented Oct 6, 2023

Uh oh!

Uh oh!

mdhaber commented Jul 13, 2022 •

edited

Loading

tupui left a comment •

edited

Loading

mdhaber Jul 13, 2022 •

edited

Loading

mckib2 commented Jul 14, 2022 •

edited

Loading

mdhaber commented Jul 14, 2022 •

edited

Loading

mdhaber commented Jul 14, 2022 •

edited

Loading

mckib2 commented Jul 16, 2022 •

edited

Loading

mdhaber Aug 3, 2022 •

edited

Loading