-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
```scipy/stats/_distn_infrastructure.py::rv_generic._argcheck()is overridden by some distributions, which not only check the arguments passed in, but also set
a` and/or `b` endpoints based on the those args. Examples include `genextreme(c)`, `genhalflogistic(c)`, `kappa4(h, k)`, `truncexpon(b)`, `genpareto(c)`.
`reciprocal(a, b)` and `truncnorm(a, b)` take this a step further, also setting some instance variables to be used later.
This causes a few problems.
- The args passed to the distribution's
_cdf/_pdf/_sf/...
methods may not match the original args. E.g. Request the CDF forstats.reciprocal(a,b)
at one fixed value (7), but for 6 different parameterizations. Thea
andb
are passed in as arrays. But only 4 of the 6 are legal.
import numpy as np
import scipy.stats
a = np.array([1, 2, 3, 4, 5, 6])
b = np.array([8, 16, 1, 32, 1, 48])
scipy.stats.reciprocal.cdf(7, a, b)
Error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/Paul/src/python/scipy/scipy/stats/_distn_infrastructure.py", line 1752, in cdf
place(output, cond, self._cdf(*goodargs))
File "/Users/Paul/src/python/scipy/scipy/stats/_continuous_distns.py", line 5928, in _cdf
return (np.log(x)-np.log(a)) / self.d
ValueError: operands could not be broadcast together with shapes (4,) (6,)
The call to cdf(7, a,b)
, generated a call to _argcheck(a, b)
scipy/scipy/stats/_continuous_distns.py
Lines 5914 to 5918 in f07bfba
def _argcheck(self, a, b): | |
self.a = a | |
self.b = b | |
self.d = np.log(b*1.0 / a) | |
return (a > 0) & (b > a) |
which set
self.d
to be an array with 6 elements.But when the distribution's
_cdf(self, x, a, b)
was called,scipy/scipy/stats/_continuous_distns.py
Lines 5927 to 5928 in f07bfba
def _cdf(self, x, a, b): | |
return (np.log(x)-np.log(a)) / self.d |
the
a
and b
arrays only had 4 elements, as the illegal ones had been removed by rv_continuous.cdf
- These distributions are not usable in multi-threaded situations. Any call to
<dist>.cdf(x, a, b)
is relying oncdf
to call_argcheck(self, a, b)
to set some members. Here's an example of how a problem can arise.
import numpy as np
import scipy
import scipy.stats
tn = scipy.stats.truncnorm
tn.cdf(1, 0, np.inf) # Use the right-half truncated normal
# 0.6826894921370859
tn._cdf(1, 0, np.inf)
# 0.6826894921370859 # cdf and _cdf return the same result. That is good.
tn.cdf(-1, -np.inf, 0) # Now use the left-half truncated normal
0.31731050786291415 # Yes, correct value
tn._cdf(1, 0, np.inf) # Asking the right-half with _cdf returns an incorrect prob
# 1.6826894921370859
tn.cdf(1, 0, np.inf) # But asking the right-half with cdf returns the correct prob
# 0.6826894921370859
tn._cdf(1, 0, np.inf) # Now asking the right-half with _cdf returns the correct prob
# 0.6826894921370859
>>> tn._cdf(1, -np.inf, 0) # Asking the left-half with _cdf returns an incorrect prob
# 0.6826894921370859
I.e. The call to tn.cdf(x, a, b)
updates the global scipy.stats.truncnorm
object. Any interleaved calls to the low-level _cdf(x, a, b)
are subject to returning incorrect probabilities, as the particular a
and b
values are not used, only the ones passed to (and stored by) _argcheck
via the higher-level cdf
call, and these are subject to change.
- I came across @ev-br's BUG/ENH: stats: make frozen distributions hold separate instances #3245, which observed the problem when freezing distributions (notice the "Ouch")
scipy/scipy/stats/_distn_infrastructure.py
Lines 429 to 432 in f07bfba
# a, b may be set in _argcheck, depending on *args, **kwds. Ouch. shapes, _, _ = self.dist._parse_args(*args, **kwds) self.dist._argcheck(*shapes) self.a, self.b = self.dist.a, self.dist.b
Should _argcheck()
be permitted to store members for later use? Restricted to just the endpoints a
and b
? Or allow anything to be stored?
Scipy/Numpy/Python version information:
1.3.0.dev0+99efed1 1.16.0 sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)