Skip to content

scipy.stats._argcheck sets (not just checks) values #9896

@pvanmulbregt

Description

@pvanmulbregt

```scipy/stats/_distn_infrastructure.py::rv_generic._argcheck()is overridden by some distributions, which not only check the arguments passed in, but also seta` and/or `b` endpoints based on the those args. Examples include `genextreme(c)`, `genhalflogistic(c)`, `kappa4(h, k)`, `truncexpon(b)`, `genpareto(c)`.
`reciprocal(a, b)` and `truncnorm(a, b)` take this a step further, also setting some instance variables to be used later.

This causes a few problems.

  1. The args passed to the distribution's _cdf/_pdf/_sf/... methods may not match the original args. E.g. Request the CDF for stats.reciprocal(a,b) at one fixed value (7), but for 6 different parameterizations. The a and b are passed in as arrays. But only 4 of the 6 are legal.
import numpy as np
import scipy.stats
a = np.array([1, 2, 3, 4, 5, 6])
b = np.array([8, 16, 1, 32, 1, 48])
scipy.stats.reciprocal.cdf(7, a, b)

Error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/Paul/src/python/scipy/scipy/stats/_distn_infrastructure.py", line 1752, in cdf
    place(output, cond, self._cdf(*goodargs))
  File "/Users/Paul/src/python/scipy/scipy/stats/_continuous_distns.py", line 5928, in _cdf
    return (np.log(x)-np.log(a)) / self.d
ValueError: operands could not be broadcast together with shapes (4,) (6,) 

The call to cdf(7, a,b), generated a call to _argcheck(a, b)

def _argcheck(self, a, b):
self.a = a
self.b = b
self.d = np.log(b*1.0 / a)
return (a > 0) & (b > a)

which set self.d to be an array with 6 elements.
But when the distribution's _cdf(self, x, a, b) was called,
def _cdf(self, x, a, b):
return (np.log(x)-np.log(a)) / self.d

the a and b arrays only had 4 elements, as the illegal ones had been removed by rv_continuous.cdf

  1. These distributions are not usable in multi-threaded situations. Any call to <dist>.cdf(x, a, b) is relying on cdf to call _argcheck(self, a, b) to set some members. Here's an example of how a problem can arise.
import numpy as np
import scipy 
import scipy.stats
tn = scipy.stats.truncnorm
tn.cdf(1, 0, np.inf)  # Use the right-half truncated normal
# 0.6826894921370859
tn._cdf(1, 0, np.inf)
# 0.6826894921370859 # cdf and _cdf return the same result. That is good.
tn.cdf(-1, -np.inf, 0)  # Now use the left-half truncated normal
0.31731050786291415   # Yes, correct value
tn._cdf(1, 0, np.inf)  # Asking the right-half with _cdf returns an incorrect prob 
# 1.6826894921370859
tn.cdf(1, 0, np.inf)  # But asking the right-half with cdf returns the correct prob 
# 0.6826894921370859
tn._cdf(1, 0, np.inf)  # Now asking the right-half with _cdf returns the correct prob 
# 0.6826894921370859
>>> tn._cdf(1, -np.inf, 0) # Asking the left-half with _cdf returns an incorrect prob 
# 0.6826894921370859

I.e. The call to tn.cdf(x, a, b) updates the global scipy.stats.truncnorm object. Any interleaved calls to the low-level _cdf(x, a, b) are subject to returning incorrect probabilities, as the particular a and b values are not used, only the ones passed to (and stored by) _argcheck via the higher-level cdf call, and these are subject to change.

  1. I came across @ev-br's BUG/ENH: stats: make frozen distributions hold separate instances #3245, which observed the problem when freezing distributions (notice the "Ouch")
    # a, b may be set in _argcheck, depending on *args, **kwds. Ouch.
    shapes, _, _ = self.dist._parse_args(*args, **kwds)
    self.dist._argcheck(*shapes)
    self.a, self.b = self.dist.a, self.dist.b

Should _argcheck() be permitted to store members for later use? Restricted to just the endpoints a and b? Or allow anything to be stored?

Scipy/Numpy/Python version information:

1.3.0.dev0+99efed1 1.16.0 sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions