ENH: stats.multivariate_t: add cdf method #17410

mdhaber · 2022-11-13T01:59:14Z

Reference issue

What does this implement/fix?

This adds a CDF method to multivariate_t.

Additional information

Rather than wrapping Genz's Fortran code, I translated his (permissively-licensed) Matlab code so that it would be easier to use a NumPy's PRNG (and avoid the equivalent of gh-16142).

That said, the number of random variates needed is always 10*dim. My plan was to compile this with Pythran in a follow-up PR, and to do that I would have to pass in those random variates because I don't think we can pass in a Generator or RandomState into a Pythran function (correct me if I'm wrong). In that case, I suppose we could apply the same strategy with the Fortran code, slightly modified to accept all the random variates as an additional input. The advantage is that we wouldn't really need to review the Fortran source; the disadvantage is that we'd need to wrap it.

I still need to look for rough edges and polish them. I suspect that the output shapes are not perfect, I don't know what happens when x is empty, etc. Update: it handles edge cases like multivariate_t.pdf does.

I went ahead and wrote my own translation because comparing existing translations may be a useful part of review.

rkern · 2022-11-13T03:16:57Z

Ah, I should have made my translation public earlier. I've been sitting on it for a while.

https://gist.github.com/rkern/8518cb3e3f57fa81d21dbd9eabcfaa87

Yeah, lifting the drawing from Generator out of the loop is pretty easy. Then the inner loop would be Pythranizable. The main blocker then is adding the necessary special functions to Pythran. I'm not sure you can do that except by releasing a new version of Pythran with those implemented.

The CBC lattice in Genz's later version is quite a bit superior to the Richtmyer lattice. @tupui I don't know if you want to consider adding the CBC lattice as a generic QMC engine. It's finicky (more than Sobol' even) in that you have to draw exactly the prime number of samples that it was configured to produce. The core algorithm is here to use, if you want.

I have an implementation of the generic multivariate normal integrand that you can integrate yourself with any QMCEngine to compare techniques. In my findings, the CBC lattice here nearly always edges out Sobol' in terms of accuracy and runtime, but they are not far.

If you want a test problem, integrating a multivariate normal with the covariance matrix with the diagonals all 1 and the off-diagonals all 0.5 over the negative orthant has an exact true answer (1 / (ndim + 1)). And it's a nontrivial covariance matrix that appears in real life problems (Dunnett's test with equal group sizes).

mdhaber · 2022-11-14T05:18:49Z

Thanks! Yeah, I knew you already translated it, but I figured one could be tested against the other. Or maybe we'll end up going for Fortran anyway.
I'll add that test problem, thanks!

@serge-sans-paille any chance we can get special.gammaincinv and ndtri in Pythran? (And ideally ndtr, but it looks like we can do that with erf if need be.)

rkern · 2022-11-14T20:49:41Z

I believe that the CBC QMC scheme used in the version that I translated is superior in performance to the scheme in the Fortan code. I think Pythranization is the way to go.

We should probably wrap the Fortran code for the bivariate and trivariate cases which can be done more accurately.

mdhaber · 2022-11-15T01:18:37Z

I think Pythranization is the way to go.

OK. I like that if we can get the special functions. If not, there's always Cython.

We should probably wrap the Fortran code for the bivariate and trivariate cases which can be done more accurately.

Good idea. But let's leave that as a followup PR. This will get big enough as it is.

I had planned to wait until @tupui get's back next month before doing too much more here, but I suppose I should ask - would you like to review this, or would you like to submit your code as a PR and I will review?

In either case, I'd like to suggest that given the same input (including random numbers), the translation should produce the same output up to machine precision. As long as we hit all the code paths with tests, this would make it very easy to check for accuracy against the Matlab originals. Is that still true in your translation?

rkern · 2022-11-15T01:44:54Z

Not quite. The CBC lattice construction involves doing some convolutions via FFTs and argmin() on the results. Sometimes there are near ties (i.e. would be true ties in infinite-precision arithmetic but aren't in float64). Due to the way MATLAB sometimes rounds small numbers down to 0, sometimes it breaks the near tie in a different way than we do. They are equally valid lattices but will cause different results. I was checking my results against the MATLAB code as evaluated by Octave and discovered that.

serge-sans-paille · 2022-11-15T20:54:00Z

@serge-sans-paille any chance we can get special.gammaincinv and ndtri in Pythran? (And ideally ndtr, but it looks like we can do that with erf if need be.)

I can give it a try. Can you open an issue in Pythran bug tracker to keep note of it?

serge-sans-paille · 2022-11-28T21:09:20Z

@mdhaber I didn't plan to do a pythran release soon, but once you have that one ready, please tell me and I'll do a minor version release.

mdhaber · 2022-11-29T00:05:17Z

Thanks @serge-sans-paille!

mdhaber · 2022-12-08T02:18:44Z

Welcome back @tupui! What would you prefer to do here?

Continue with Genz's qsimvtv (what we have here).
Have me translate Genz's qsilatmvtv (improved) in the same style (i.e. roughly line-by-line translation).
Start with 1, and do 2 in a followup.
Replace this with Robert's translation.

tupui · 2022-12-08T16:02:07Z

Thanks 😃

I still need to have a look and make a decision. Was busy with other reviews so far. I will try to have an answer tmr.

tupui · 2022-12-26T12:36:39Z

Sorry for the delay.

Welcome back @tupui! What would you prefer to do here?

Continue with Genz's qsimvtv (what we have here).

Have me translate Genz's qsilatmvtv (improved) in the same style (i.e. roughly line-by-line translation).

Start with 1, and do 2 in a followup.

Replace this with Robert's translation.

What about we use @rkern's version with CBC and use your current version as a validation test?

mdhaber · 2023-01-06T01:49:22Z

OK @tupui I updated per your recommendation. Those Cirrus failure seems unrelated. Other failures seem to be due to an fftpack import; I'll see if I can switch that out.

@rkern how do you want to be credited? Before merging, we could do an interactive rebase in which you would author the first commit (maybe just adding the file _qmvnt.py) and we'd squash all the rest of my commits on top. Alternatively, would you be ok being listed as a coauthor when we squash-merge (easier)?

scipy/stats/_qmvnt.py

rkern · 2023-01-06T02:04:33Z

Whatever's easiest.

tupui

Sorry for the delay.

I did a first pass. I am not sure about the methods we need in the end in _qmvnt.py. There are lot of unused code so before I dive into some function it would help to clarify what is to stay.

scipy/stats/tests/data/_mvt.py

tupui · 2023-02-02T13:22:06Z

scipy/stats/tests/test_multivariate.py

+        A = rng.random(size=(dim, dim))
+        cov = A @ A.T
+        mean = rng.random(dim)
+        a = -rng.random(dim)
+        b = rng.random(dim)
+        df = rng.random() * 5


If this is not too slow, what do you think about testing more variations of all the random parts? Without going full hypothesis, we could make a loop of 5-10? (OC real known corner cases would be better.)

scipy/stats/_qmvnt.py

tupui · 2023-02-02T14:09:36Z

scipy/stats/_qmvnt.py

+    return prob, est_error, n_samples
+
+
+def _bounds(m, t, alternative):


It's for the Dunnett test. Should be removed from here because I don't think this file will be where that gets implemented.

mdhaber

I'm getting warnings when the covariance matrix is singular.

The MATLAB code's extra masking stops the NaN from leaking out and replacing it with a 0, but the actual value that comes out is not quite correct for the cusp (which does seem to have a finite, nonzero limiting value).

So should we warn the user about a loss of accuracy then?

For singular covariance, I'm also getting a lot of substantial discrepancies between this and _qsimvtv without much improvement when I increase the number of points in _qsimvtv. Would you expect _qsimvtv to be very inaccurate in this case? I tried using Matlab as a reference, but its mvtcdf doesn't allow a singular matrix.

scipy/stats/_qmvnt.py

rkern · 2023-02-03T20:59:40Z

So should we warn the user about a loss of accuracy then?

What I was talking about is not a case that arises with the CBC lattice implemented here. It would arise if we did use qmc_quad() with Sobol'. There are several other reasons why I don't think that would be an improvement, regardless of the boundary issue. In any case, I don't think there's any significant loss of accuracy in any case in which the mask is triggered.

rkern · 2023-02-03T21:34:26Z

Would you expect _qsimvtv to be very inaccurate in this case?

I don't expect any implementation to be particularly accurate in the singular case.

rkern · 2023-02-05T03:28:12Z

I've updated my gist with a notebook with some more test problems. It turns out that the analytically-known case that I used in my benchmark is a special case of a slightly wider family of a structured covariance matrix where the multivariate normal integral can be reduced down to a univariate integral that is straightforward to numerically integrate with quad().

A related special case family creates a singular matrix, and it too can be integrated numerically.

And finally, it does turn out that one similar special case family can turn the multivariate t into a double-integral that dblquad() can handle fine. I think it is likely that the strategy used in the singular MVN case can be ported over to the singular MVT case if we really care about it, but I'm definitely not going through the math to derive it.

Co-authored-by: Robert Kern <robert.kern@gmail.com>

tupui

I've updated my gist with a notebook with some more test problems. It turns out that the analytically-known case that I used in my benchmark is a special case of a slightly wider family of a structured covariance matrix where the multivariate normal integral can be reduced down to a univariate integral that is straightforward to numerically integrate with quad().

I am not sure I understand what conclusion you want to draw.

I am not going to block this. It's in relative ok shape and main points have been discussed. Let's finish up last ones and we can merge this soon.

scipy/stats/_qmvnt.py

rkern · 2023-02-06T16:27:45Z

I am not sure I understand what conclusion you want to draw.

That notebook has no particular conclusion. I am offering it as an additional source of test problems to be used in our test suite rather than testing by comparing two ports of the only slightly-different versions of basically the same MATLAB code.

It can also be used to do more thorough comparisons on the relative accuracy between the CBC lattice and Sobol' on a wider variety of problems. I ran those benchmarks last night. My findings from the more restricted problem hold up. If there is interest, I can clean up the visualizations of those results.

Here you are proposing 2 other QMC methods instead of the one we already have. I think it's legitimate to ask why we should add other methods just for this purpose and not follow our existing infrastructure.

Just one, we can drop the Richtmyer lattice; it's always worse.

On the whole, I have not questioned the legitimacy of your questions. I have simply answered them. I anticipated the question about using qmc_quad() with Sobol' (I was curious myself, and hoping that it would be better!), which is why I did the comparison ahead of this PR and published the testing code before you asked your questions.

The only thing I did question was the relevancy of one complaint about the fairness of how the comparison was done with respect to the boundary issue, most particularly because there was no mechanism by which restoring "fairness" on that particular issue could work in Sobol''s favor in terms of accuracy or time. It was a secondary issue that could not affect the primary conclusion.

In the event, it didn't matter. All combinations (doing the fastest thing for both, doing the safest thing for both even if it was unnecessary, doing the optimal thing for both) don't change the relative orderings of accuracy and time.

If in this case you are certain it's faster and converge faster. Then ok we have our answer and can accept the additional maintenance cost knowingly.

I am very confident of this, particularly with the new benchmark problems. Yes, I will maintain this code. The inline generation of the QMC points is, by a long stretch, the easiest part of this code to understand and maintain.

Just a note that our QMC engines are not that slow as being implemented in Cython

For sure! They are not slow, by any means. Their API, though it is a very good general purpose API, is just less well-adapted to this particular problem. The integrand is arranged such that it can deal with each dimension one by one, vectorizing across all of the samples. I don't need to generate whole blocks of QMC samples, just one column at a time. Lattice rules work really well for that. But even if we did have lattice QMCEngines, their APIs and that of qmc_quad() doesn't let us make use of that.

tupui · 2023-02-06T16:41:35Z

The integrand is arranged such that it can deal with each dimension one by one, vectorizing across all of the samples. I don't need to generate whole blocks of QMC samples, just one column at a time. Lattice rules work really well for that. But even if we did have lattice QMCEngines, their APIs and that of qmc_quad() doesn't let us make use of that.

This is something I have in the back of my mind, add a dimension to a sample/engine. A lot of QMC methods, Sobol' to start with, allow to add a dimension (and are built like that like here.) I am mostly waiting for more people to ask this and really put some thoughts on that. I also have a use case for some people doing DoE but that might be too niche (and also arguable) for now.

…d code

mdhaber · 2023-02-06T19:02:23Z

I fixed the PEP8 issues in the docstrings and removed or commented about the unused code. I also added tests against generic integrators, Matlab, and the analytical result for the singular case mentioned in #17410 (comment).

scipy/stats/_qmvnt.py

tupui

LGTM and CI is happy. Let's get this in. Thanks all for working on that and the discussions. (Unrelated failures on Cirrus as too usual.)

tupui · 2023-02-06T20:05:38Z

scipy/stats/tests/test_multivariate.py

+            ref = _qsimvtv(20000, df, cov, a - mean, b - mean, rng)[0]
+        assert_allclose(res, ref, atol=1e-4, rtol=1e-3)
+
+    def test_cdf_against_generic_integrators(self):


+1, I had a very similar code locally when doing some tests. I also had a simple 1D case with an empirical CDF but this should be more robust and cover this.

rkern · 2023-02-06T21:33:06Z

the analytical result for the singular case mentioned in #17410 (comment).

Just for the record, this case is not singular.

Also, I presented it as an analytic result for the multivariate normal. Fortunately, it also seems to be true for the multivariate t, regardless of df, at least numerically. I don't have a derivation for it, though, like I do for the multivariate normal case.

rkern · 2023-02-07T03:42:07Z

Ah, I do have a derivation. For the negative orthant bounds, all of the w terms that interact with df disappear. The w terms can all be moved outside of the u integral and evaluate to 1. Then the remaining integrand reduces to the same gaussian integral from the corresponding multivariate normal case.

mdhaber · 2024-09-09T18:15:42Z

@rkern there is an effort to remove very old Fortran 77 code (gh-18566) and one of the items on the list is mvndst. We plan to compile _qmvnt.py, and its _qmvn function should be able to eliminate the reliance on _mvn.mvnun (and resolve some open issues, like gh-16142 and maybe gh-8367). The other use of mvndst is in gaussian_kde.integrate_box, where it calls mvnun_weighted.

Would this functionality be difficult to incorporate into qmvnt.py? Would you be willing to add it?

ENH: stats.multivariate_t: add cdf method

b424f90

mdhaber added the scipy.stats label Nov 13, 2022

mdhaber marked this pull request as draft November 13, 2022 02:01

mdhaber mentioned this pull request Nov 15, 2022

ENH: Pythan support for additional scipy.special functions serge-sans-paille/pythran#2035

Closed

3 tasks

mdhaber added 6 commits January 5, 2023 16:19

ENH: stats.multivariate_t: add unmodified qvmnt by Robert Kern

c92b8c4

MAINT: stats.multivariate_t: modify qmvnt for inclusion in SciPy

f3789d8

MAINT: stats.multivariate_t: rewire to replace _qsimtv with _qmvt

dff242d

TST: stats.multivariate_t: test _qmvt against _qsimvtv

38c9036

Merge remote-tracking branch 'upstream/main' into mvt_cdf

f82bc29

TST: stats.multivariate_t.cdf: test vectorized behavior

8719136

mdhaber added the enhancement A new feature or improvement label Jan 6, 2023

mdhaber commented Jan 6, 2023

View reviewed changes

scipy/stats/_qmvnt.py Outdated Show resolved Hide resolved

mdhaber marked this pull request as ready for review January 6, 2023 01:51

Update scipy/stats/_qmvnt.py

8849d3b

mdhaber mentioned this pull request Jan 10, 2023

SciPy: Fundamental Tools for Biomedical Research mdhaber/scipy#90

Open

55 tasks

tupui reviewed Feb 2, 2023

View reviewed changes

MAINT: stats.multivariate_t.cdf: implement review suggestions

df5edde

Merge remote-tracking branch 'upstream/main' into mvt_cdf

e2bd434

mdhaber commented Feb 3, 2023

View reviewed changes

scipy/stats/_qmvnt.py Outdated Show resolved Hide resolved

scipy/stats/_qmvnt.py Outdated Show resolved Hide resolved

MAINT: stats.multivariate_t.cdf: avoid NaNs/infs

5489fae

DOC: add some docstrings. [skip cirrus] [skip actions]

2477543

Co-authored-by: Robert Kern <robert.kern@gmail.com>

tupui reviewed Feb 6, 2023

View reviewed changes

scipy/stats/_qmvnt.py Show resolved Hide resolved

tupui added this to the 1.11.0 milestone Feb 6, 2023

mdhaber added 2 commits February 6, 2023 10:59

MAINT: stats.multivariate_t.cdf: fix pep8, remove/comment about unuse…

ccb2d84

…d code

TST: stats.multivariate_t.cdf: add tests

d11cb5e

rkern reviewed Feb 6, 2023

View reviewed changes

scipy/stats/_qmvnt.py Show resolved Hide resolved

tupui approved these changes Feb 6, 2023

View reviewed changes

tupui merged commit 9b1195e into scipy:main Feb 6, 2023

mdhaber mentioned this pull request Mar 4, 2023

MAINT: add multicomp and refactor false_discovery_control, tukey_hsd tupui/scipy#15

Closed

mdhaber mentioned this pull request Jun 16, 2023

MAINT:stats:Cythonize and remove Fortran statlib code #18679

Merged

ilayn mentioned this pull request Jun 16, 2023

META: FORTRAN Code inventory #18566

Open

37 tasks

mdhaber mentioned this pull request Sep 9, 2024

ENH: Fix the random state in scipy.stats.multivariate_normal.cdf() #16142

Closed

ev-br mentioned this pull request Jan 10, 2025

MAINT: stats: remove mvn fortran calls from multivariate_normal.cdf #22298

Merged

lucascolley mentioned this pull request Aug 15, 2025

BUG: huge speed and quality regression for stats.multivariate_normal.cdf #23469

Open

		return prob, est_error, n_samples


		def _bounds(m, t, alternative):

Uh oh!

ENH: stats.multivariate_t: add cdf method #17410

ENH: stats.multivariate_t: add cdf method #17410

Uh oh!

Conversation

mdhaber commented Nov 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference issue

What does this implement/fix?

Additional information

Uh oh!

rkern commented Nov 13, 2022

Uh oh!

mdhaber commented Nov 14, 2022

Uh oh!

rkern commented Nov 14, 2022

Uh oh!

mdhaber commented Nov 15, 2022

Uh oh!

rkern commented Nov 15, 2022

Uh oh!

serge-sans-paille commented Nov 15, 2022

Uh oh!

serge-sans-paille commented Nov 28, 2022

Uh oh!

mdhaber commented Nov 29, 2022

Uh oh!

mdhaber commented Dec 8, 2022

Uh oh!

tupui commented Dec 8, 2022

Uh oh!

tupui commented Dec 26, 2022

Uh oh!

mdhaber commented Jan 6, 2023

Uh oh!

Uh oh!

rkern commented Jan 6, 2023

Uh oh!

tupui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tupui Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tupui Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

rkern Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

mdhaber left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rkern commented Feb 3, 2023

Uh oh!

rkern commented Feb 3, 2023

Uh oh!

rkern commented Feb 5, 2023

Uh oh!

tupui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rkern commented Feb 6, 2023

Uh oh!

tupui commented Feb 6, 2023

Uh oh!

mdhaber commented Feb 6, 2023

Uh oh!

mdhaber commented Nov 13, 2022 •

edited

Loading

mdhaber left a comment •

edited

Loading

tupui left a comment •

edited

Loading