-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
MAINT: stats: rewrite gaussian_kde.integrate_box
, remove _mvn
F77 extension
#22611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See scipy/stats/_qmvnt.py for the replacement. Note that - mvndst.f was code by Alan Genz - _qmvnt.py code is a python translation of a(n improved) Matlab implementation, also by Alan Genz
gaussian_kde.integrate_box
, remove _mvn
F77 extension
gaussian_kde.integrate_box
, remove _mvn
F77 extensiongaussian_kde.integrate_box
, remove _mvn
F77 extension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't look at the nitty gritty of the tests yet. Before I do, just wanted to check what prompted the addition of PDF tests - is there nothing like them there already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! One question on the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, too; just one question about whether there's a reason to use atol
instead of rtol
. It may be fine here, but it's not immediately obvious because MVN pdf/cdf can easily get quite small, so I'd default to rtol
unless there's a reason to use atol
.
warnings.warn(msg, stacklevel=2) | ||
|
||
return value | ||
low, high = low_bounds - self.dataset.T, high_bounds - self.dataset.T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that dataset
is atleast_2d
. Also, it looks like low_bounds
and high_bounds
are only supposed to work with a single point at a time, and that will always be aligned along the -1
axis (which is why we transpose dataset
).
high, lower_limit=low, cov=self.covariance, maxpts=maxpts | ||
) | ||
# XXX: xp.linalg.vecdot is much faster than (a*b).sum(axis=-1) | ||
return (values * self.weights).sum(axis=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also checked that weights is already normalized. 👍
scipy/stats/tests/test_kdeoth.py
Outdated
assert_allclose( | ||
gkde(xx), | ||
stats.norm.pdf(xx[:, None], loc=loc, scale=scale).sum(axis=-1) / gkde.n, | ||
atol=1e-14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use atol
throughout? If not, I'd default to rtol
. Especially when looking at multivariate normal later, it's not immediately obvious that assertions about the PDF with atol=1e-14
really test anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
¯\_(ツ)_/¯
this is just my default to "nearly equals in double precision", with an order of magnitude of a wiggle room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then please consider changing that default to rtol=1e-14
, at least when working on stats and special functions.
arg = xx[:, None, :] - gkde.dataset.T | ||
pdf = stats.multivariate_normal.pdf | ||
assert_allclose( | ||
gkde(xx.T), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance,
gkde(xx.T)
# [5.35496592e-02 3.93564062e-04 6.68839968e-14]
so with atol=1e-14
, the test at point [5, 6]
is not doing much, and it's allowing quite different relative precision for the other values. Seeing [5, 6] as the argument to MVN is actually what prompted the comment.
rtol=1e-14
typically aligns better with the intent "nearly equals in double precision"1.
Footnotes
-
assert_array_max_ulp
is even better (slightly), except for the name. It might be worth redefiningxp_assert_close
and such in terms of ULPs, though, or at least allowing aulp
option. ↩
Alright, thanks @ev-br and reviewers! Oops looks like there is some value in the history here, so I'll squash the last three commits into one before merging. |
[skip ci]
Reference issue
A follow-up to #22298
What does this implement/fix?
Use a new
_qmvn
multivariate normal integrator ingaussian_kde
. Previously, gh-22298 replaced the F77 integrator,_mvn.mvnun
, with a improved version. Now this PR replaces_mvn.mvnun_weighted
, which completes the removal of the F77 original.Additional information
The first commits only adds tests of the "old" implementation vs multivariate_normal; the second commit changes the implementation, and the third commit removes the Fortran code.