MAINT: stats: rewrite `gaussian_kde.integrate_box`, remove `_mvn` F77 extension #22611

ev-br · 2025-03-01T22:45:01Z

Reference issue

A follow-up to #22298

What does this implement/fix?

Use a new _qmvn multivariate normal integrator in gaussian_kde. Previously, gh-22298 replaced the F77 integrator, _mvn.mvnun, with a improved version. Now this PR replaces _mvn.mvnun_weighted, which completes the removal of the F77 original.

Additional information

The first commits only adds tests of the "old" implementation vs multivariate_normal; the second commit changes the implementation, and the third commit removes the Fortran code.

See scipy/stats/_qmvnt.py for the replacement. Note that - mvndst.f was code by Alan Genz - _qmvnt.py code is a python translation of a(n improved) Matlab implementation, also by Alan Genz

mdhaber

Didn't look at the nitty gritty of the tests yet. Before I do, just wanted to check what prompted the addition of PDF tests - is there nothing like them there already?

scipy/stats/_kde.py

scipy/stats/tests/test_kdeoth.py

rkern

LGTM!

rgommers

Looks good! One question on the API.

scipy/stats/_kde.py

mdhaber

LGTM, too; just one question about whether there's a reason to use atol instead of rtol. It may be fine here, but it's not immediately obvious because MVN pdf/cdf can easily get quite small, so I'd default to rtol unless there's a reason to use atol.

mdhaber · 2025-03-02T21:15:38Z

scipy/stats/_kde.py

-            warnings.warn(msg, stacklevel=2)
-
-        return value
+        low, high = low_bounds - self.dataset.T, high_bounds - self.dataset.T


Confirmed that dataset is atleast_2d. Also, it looks like low_bounds and high_bounds are only supposed to work with a single point at a time, and that will always be aligned along the -1 axis (which is why we transpose dataset).

mdhaber · 2025-03-02T21:24:24Z

scipy/stats/_kde.py

+            high, lower_limit=low, cov=self.covariance, maxpts=maxpts
+        )
+        # XXX: xp.linalg.vecdot is much faster than (a*b).sum(axis=-1)
+        return (values * self.weights).sum(axis=-1)


Also checked that weights is already normalized. 👍

mdhaber · 2025-03-02T21:28:37Z

scipy/stats/tests/test_kdeoth.py

+    assert_allclose(
+        gkde(xx), 
+        stats.norm.pdf(xx[:, None], loc=loc, scale=scale).sum(axis=-1) / gkde.n,
+        atol=1e-14


Is there a reason to use atol throughout? If not, I'd default to rtol. Especially when looking at multivariate normal later, it's not immediately obvious that assertions about the PDF with atol=1e-14 really test anything.

¯\_(ツ)_/¯ this is just my default to "nearly equals in double precision", with an order of magnitude of a wiggle room.

OK, then please consider changing that default to rtol=1e-14, at least when working on stats and special functions.

mdhaber · 2025-03-02T22:30:55Z

scipy/stats/tests/test_kdeoth.py

+    arg = xx[:, None, :] - gkde.dataset.T
+    pdf = stats.multivariate_normal.pdf
+    assert_allclose(
+        gkde(xx.T),


For instance,

gkde(xx.T) # [5.35496592e-02 3.93564062e-04 6.68839968e-14]

so with atol=1e-14, the test at point [5, 6] is not doing much, and it's allowing quite different relative precision for the other values. Seeing [5, 6] as the argument to MVN is actually what prompted the comment.

rtol=1e-14 typically aligns better with the intent "nearly equals in double precision"¹.

Footnotes

assert_array_max_ulp is even better (slightly), except for the name. It might be worth redefining xp_assert_close and such in terms of ULPs, though, or at least allowing a ulp option. ↩

scipy/stats/tests/test_kdeoth.py

mdhaber · 2025-03-03T01:29:27Z

Alright, thanks @ev-br and reviewers!

Oops looks like there is some value in the history here, so I'll squash the last three commits into one before merging.

[skip ci]

ev-br added 3 commits March 1, 2025 22:36

TST: stats: test kde vs multivariate_normal

8b67bbb

MAINT: stats: use multivariate_normal.cdf in gaussian_kde.integrate_box

154b974

MAINT: stats: remove the F77 _mvn extension sources

4c9cb1f

See scipy/stats/_qmvnt.py for the replacement. Note that - mvndst.f was code by Alan Genz - _qmvnt.py code is a python translation of a(n improved) Matlab implementation, also by Alan Genz

ev-br added scipy.stats maintenance Items related to regular maintenance tasks labels Mar 1, 2025

ev-br requested a review from rgommers as a code owner March 1, 2025 22:45

github-actions bot added Fortran Items related to the internal Fortran code base Meson Items related to the introduction of Meson as the new build system for SciPy labels Mar 1, 2025

ev-br changed the title ~~Kde mvn~~ Rewrite gaussian_kde.integrate_box, remove _mvn F77 extension Mar 1, 2025

ev-br requested review from rkern and mdhaber March 1, 2025 22:45

lucascolley changed the title ~~Rewrite gaussian_kde.integrate_box, remove _mvn F77 extension~~ MAINT: stats: rewrite gaussian_kde.integrate_box, remove _mvn F77 extension Mar 2, 2025

mdhaber reviewed Mar 2, 2025

View reviewed changes

scipy/stats/_kde.py Show resolved Hide resolved

scipy/stats/tests/test_kdeoth.py Show resolved Hide resolved

rkern approved these changes Mar 2, 2025

View reviewed changes

rgommers reviewed Mar 2, 2025

View reviewed changes

scipy/stats/_kde.py Outdated Show resolved Hide resolved

mdhaber mentioned this pull request Mar 2, 2025

ENH: stats: use vecdot and nonzero where appropriate #22616

Merged

mdhaber approved these changes Mar 2, 2025

View reviewed changes

mdhaber reviewed Mar 2, 2025

View reviewed changes

scipy/stats/tests/test_kdeoth.py Outdated Show resolved Hide resolved

scipy/stats/tests/test_kdeoth.py Outdated Show resolved Hide resolved

scipy/stats/tests/test_kdeoth.py Outdated Show resolved Hide resolved

MAINT: stats: address review comments

a61cc13

[skip ci]

mdhaber force-pushed the kde_mvn branch from c700cdb to a61cc13 Compare March 3, 2025 01:38

mdhaber merged commit 6fa9763 into scipy:main Mar 3, 2025

rgommers mentioned this pull request Mar 4, 2025

TST: stats: ensure tests are thread-safe #22125

Merged

ev-br mentioned this pull request Mar 4, 2025

META: FORTRAN Code inventory #18566

Open

37 tasks

rgommers mentioned this pull request Mar 4, 2025

BUG: stats: kde.integrate_box was missing an rng parameter #22624

Merged

mdhaber added this to the 1.16.0 milestone May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT: stats: rewrite `gaussian_kde.integrate_box`, remove `_mvn` F77 extension #22611

MAINT: stats: rewrite `gaussian_kde.integrate_box`, remove `_mvn` F77 extension #22611

Uh oh!

ev-br commented Mar 1, 2025

Uh oh!

mdhaber left a comment

Uh oh!

Uh oh!

Uh oh!

rkern left a comment

Uh oh!

rgommers left a comment

Uh oh!

Uh oh!

mdhaber left a comment

Uh oh!

mdhaber Mar 2, 2025

Uh oh!

mdhaber Mar 2, 2025

Uh oh!

mdhaber Mar 2, 2025

Uh oh!

ev-br Mar 2, 2025

Uh oh!

mdhaber Mar 2, 2025 •

edited

Loading

Uh oh!

mdhaber Mar 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdhaber commented Mar 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

MAINT: stats: rewrite gaussian_kde.integrate_box, remove _mvn F77 extension #22611

MAINT: stats: rewrite gaussian_kde.integrate_box, remove _mvn F77 extension #22611

Uh oh!

Conversation

ev-br commented Mar 1, 2025

Reference issue

What does this implement/fix?

Additional information

Uh oh!

mdhaber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rkern left a comment

Choose a reason for hiding this comment

Uh oh!

rgommers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mdhaber left a comment

Choose a reason for hiding this comment

Uh oh!

mdhaber Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

mdhaber Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

mdhaber Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

ev-br Mar 2, 2025

Choose a reason for hiding this comment

Uh oh!

mdhaber Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Footnotes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mdhaber commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

MAINT: stats: rewrite `gaussian_kde.integrate_box`, remove `_mvn` F77 extension #22611

MAINT: stats: rewrite `gaussian_kde.integrate_box`, remove `_mvn` F77 extension #22611

mdhaber Mar 2, 2025 •

edited

Loading

mdhaber Mar 2, 2025 •

edited

Loading

mdhaber commented Mar 3, 2025 •

edited

Loading