BUG: stats: Fix broadcasting in the rvs() method of the distributions. #5526

WarrenWeckesser · 2015-11-23T13:38:06Z

This is a fix for #2069

I haven't added any new tests yet, but the existing stats test suite passes on my system (Mac OS X 10.9.5, python 2.7.10, numpy 1.10.1). Here are some examples using the updated rvs method:

In [1]: from scipy.stats import norm

In [2]: norm.rvs(loc=np.zeros(5))
Out[2]: array([-0.98069527, -0.75010438,  0.22639067, -0.49124641,  1.29047549])

In [3]: norm.rvs(loc=[1, 10, 100], scale=[0.1, 0.01, 1])
Out[3]: array([  1.06500076,   9.99321314,  99.47675459])

In [4]: norm.rvs(loc=[1, 10, 100], scale=[0.1, 0.01, 1], size=3)
Out[4]: array([  1.04240551,   9.9775307 ,  99.0535618 ])

In [5]: norm.rvs(loc=[1, 10, 100], scale=[0.1, 0.01, 1], size=(3,))
Out[5]: array([   1.16531601,    9.99863233,  100.38931409])

In [6]: norm.rvs(loc=[1, 10, 100], scale=[0.1, 0.01, 1], size=(2, 3))
Out[6]: 
array([[   1.01646612,    9.98911665,  100.43066246],
       [   1.11037045,   10.01307627,  102.03790766]])

Argument with incompatible shapes (e.g. norm.rvs([1, 2, 3], [4, 5]), gamma.rvs(array([2.0, 5.0, 10.0, 15.0]), size=(2,2))) will raise a ValueError. The compatibility test is the same as that used by the random variate generators in numpy.random. The rule is that, if size is given, it determines the shape of the output--broadcasting can not change the output size.

WarrenWeckesser · 2015-11-23T13:45:13Z

Side note: I would really like to eliminate the use of the _size attribute of the distribution classes. It is set only in the rvs method, and accessed only in the _rvs method, so it should simply be an argument to _rvs. Making it an attribute is pointless. However, the docstring for rv_continuous includes _rvs in the methods that subclasses can override, so we're stuck with the _size attribute for backwards compatibility.

rgommers · 2015-11-24T20:03:16Z

Yep, sub-optimal design but no good way to get rid of self._size. It shouldn't get in the way of this fix though right?

rgommers · 2015-11-24T20:12:28Z

Change is nicely local, called from only one place. So should be able to get it reviewed/merged without too large risks right before a release. Existing test coverage for non-scalar loc/scale/size is very thin at the moment though, needs expanding.

WarrenWeckesser · 2015-11-24T22:37:32Z

Yep, sub-optimal design but no good way to get rid of self._size. It shouldn't get in the way of this fix though right?

Right, the code in this PR leaves the use of ._size as it is. In an early draft, I got rid of it, but then realized _rvs is effectively a public part of the API, so I undid those changes (grumble, grumble).

Existing test coverage for non-scalar loc/scale/size is very thin at the moment though, needs expanding.

Working on it...

WarrenWeckesser · 2015-11-27T06:43:39Z

Updated with some tests. I'm still thinking about some more thorough tests. Suggestions are welcome!

This is still WIP. The pearson3 implementation needs some work. The Pearson Type III distribution can be expressed as a shifted gamma distribution, so I'm looking into rewriting pearson3 in terms of gamma (instead of trying to fix pearson3.rvs()).

ev-br · 2015-11-27T14:13:44Z

I quickly drafted a test based on the stackoverflow thread, which is sensitive to the order of broadcasting and drawing variates: https://github.com/ev-br/scipy/tree/pr/5526
The relevant commit is ev-br@8c07855, the second commit tries to cover discrete distributions as well, but it likely doesn't do it.

In master, this fails all over. With this PR, I only see a couple of failures in truncnorm which does if self.a > 0 somewhere.
[and that is quite a Pandora's box].

This test is just a suggestion.

EDIT: Also array-valued shapes: ev-br@fda4518

WarrenWeckesser · 2015-11-27T14:58:21Z

Thanks! It occurred to me that another simple test would be to use np.vectorize to create a function that only calls the rvs method with scalars. Then the test would compare the results of the vectorize'd method with the actual method; if the random seed is set before each call, they should return the same array.

WarrenWeckesser · 2015-11-28T13:04:15Z

Then the test would compare the results of the vectorize'd method with the actual method; if the random seed is set before each call, they should return the same array.

Actually, this isn't true, because some of the rvs methods use two random numbers from the random number generator to generate one random variate of the distribution. For example, betaprime._rvs is

def _rvs(self, a, b):
    sz, rndm = self._size, self._random_state
    u1 = gamma.rvs(a, size=sz, random_state=rndm)
    u2 = gamma.rvs(b, size=sz, random_state=rndm)
    return (u1 / u2)

Calling that function several times with scalars will not produce the same variates as calling it once with broadcast arguments or with size != 1. That's not a bug in betaprime. It's just an implementation detail, but it means the idea of comparing np.vectorize'd results to actual broadcast results won't work in general. The good news is that all the continuous distributions in the dists list in test_distributions.py passed the test except for betaprime, dgamma, exponnorm, nct, dweibull, and rice (all of which use more than one random number per variate) and for pearson3 (which I am still skipping).

WarrenWeckesser · 2015-11-28T13:56:06Z

With this PR, I only see a couple of failures in truncnorm which does if self.a > 0 somewhere.

truncnorm uses if self.a > 0 twice. That can be fixed by using np.where or _lazywhere. I'll include a fix for truncnorm when I update the PR.

ev-br · 2015-11-28T17:32:10Z

It would also be helpful to have a test explicitly comparing the output of rvs and a corresponding numpy generator for several distributions, especially with those which take non-empty shape parameters. Eg. beta or binom.

ev-br · 2015-11-28T17:37:01Z

Regarding the size handling, I guess we could (i) add size parameter to the signature of _rvs method of distributions we have in scipy.stats, (ii) keep the old code path and deprecate it in the docs and release notes with the intent of removing it in a couple of releases, (iii) in code, inspect the signature of self._rvs and use either the old or the new code paths (this can be similar to how _stats_has_moments is handled).

This can be done in a separate PR, of course :-). I just think it would be nice to change both things in one release if possible.

ev-br · 2015-11-28T23:54:27Z

scipy/stats/tests/test_distributions.py

+
+def test_all_rvs_broadcast_shape():
+    # This test only checks the *shape* of the value returned by rvs().
+    for dist in dists:


Note that dists is a hard coded list which does not include all distributions IIRC. (I actually wonder why this is so, but that's off-topic)

I'm moving the tests over to test_continuous_basic.py and test_discrete_basic.py, where they will use the lists distcont and distdiscrete.

(I actually wonder why this is so [...])

I do that a lot when working with the distributions code. :)

WarrenWeckesser · 2015-11-30T20:59:33Z

Even without the python 3 test failures, this is still WIP. I haven't touched pearson3 yet.

@ev-br I didn't use your commit, but putting the test "kernel" in common_tests.py was definitely the way to go. I think the tests now cover what you had in your commits, and more.

WarrenWeckesser · 2015-11-30T21:32:03Z

@rgommers: In case it isn't obvious: this PR won't be ready in time for 0.17. Making the rvs() arguments work correctly might end up requiring that all the distribution methods properly broadcast. In theory, that's supposed to be the case already (see, for example, http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html#broadcasting). In practice, not so much.

rgommers · 2015-11-30T22:08:47Z

So we can remove the 0.17.0 label from gh-2069?

WarrenWeckesser · 2015-11-30T22:19:51Z

So we can remove the 0.17.0 label from gh-2069?

Yup, I just did.

pwaller · 2016-03-03T18:23:41Z

Any time scale for completing this? Just hit the bug too.

WarrenWeckesser · 2016-03-04T01:02:59Z

@pwaller I've been on a bit of a hiatus from scipy development, but I plan to get this into 0.18.

pwaller · 2016-03-04T09:44:27Z

In the meantime, what is the best workaround which won't break once this has landed? Is there any? I'm trying to generate many rvs with many values of parameters, e.g: beta.rvs([1, 2], [1, 2], size=(10, 2)) (I want 10 rvs each for 2 parameters). But I can't find any series of parameters which make that work. I guess I have to live with a non-vectorized implementation for now?

Edit: in the end I settled with array(list(beta(a, b).rvs(size=10000) for a, b in zip(As, Bs)))which is moderately fast, taking ~500ms for ~100 sets of parameters.

WarrenWeckesser · 2016-06-03T16:09:40Z

Despite my hiatus in scipy contributions, I haven't forgotten about this.

I think the tentative plan is to release 0.18 this month. Is there a more concrete timeline for 0.18?

larsoner · 2016-06-03T16:11:11Z

The milestone page lists June 14, but judging based on the number of PRs marked for 0.18, I'm skeptical :) I don't know of anything more concrete than that, at least.

WarrenWeckesser · 2016-06-03T16:14:12Z

Thanks Eric. Time to dive back into this...

WarrenWeckesser · 2016-06-18T01:47:10Z

Updated.

ev-br · 2016-06-18T11:48:54Z

This would need a mention in the release notes, in big bold letters.

WarrenWeckesser · 2016-06-18T17:25:06Z

@ev-br I added a note to the release notes about the bug fix. It is in the "Backwards incompatible changes" section, mainly because there are cases with arguments that are incompatible with a given size where the function used to return values and now it raises an error. My guess is that case will be pretty rare in the wild (but I admit it is just a guess).

I refrained from using "big bold letters". 😄

ev-br · 2016-06-18T19:45:11Z

That's big enough and bold enough for me :-). This PR looks good to me now. I'm going to merge it before branching unless there are further comments.

ev-br · 2016-06-19T12:02:30Z

Needs a rebase (again).

Here's something I don't understand:

In [22]: scipy.__version__
Out[22]: '0.18.0.dev0+36683cf'

In [23]:  stats.norm.rvs(size=4, random_state=np.random.RandomState(1234))
Out[23]: array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 ])

In [24]: stats.levy_stable.rvs(alpha=1, beta=0, size=4, random_state=np.random.RandomState(1234))
Out[24]: array([-0.12827986, -0.73108696, -2.39290307, -0.17495177])

In [25]: stats.levy_stable.rvs(alpha=1.5, beta=0.3, size=4, random_state=np.random.RandomState(1234))
Out[25]: array([-0.40182381,  1.65060746, -1.64155074,  0.49261358])

In scipy 0.17:

In [16]: scipy.__version__
Out[16]: '0.17.0'

In [17]: stats.norm.rvs(size=4, random_state=np.random.RandomState(1234))
Out[17]: array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 ])

In [18]:  stats.levy_stable.rvs(alpha=1, beta=0, size=4, random_state=np.random.RandomState(1234))
Out[18]: array([-1.13381723,  0.56711265,  3.39976097,  0.03294947])

In [19]:  stats.levy_stable.rvs(alpha=1.5, beta=0.3, size=4, random_state=np.random.RandomState(1234))
Out[19]: array([ 0.67579527, -0.76668594,  0.82701716,  1.36710987])

Notice that the random draws are the same for norm (I also spot-checked gamma and rice), as they should. For levy_stable however, they differ.

The class rv_generic was updated to handle broadcasting of the arguments to the rvs() method. The following distributions required additional custom changes to support broadcasting in rvs(): levy_stable, pearson3, rice, truncnorm, hypergeom, planck, randint. The function `numpy.broadcast_to` is used in several places. This function was added to numpy in version 1.10, so a copy was added to _lib/_numpy_compat.py to support older version of numpy. Closes scipygh-2069.

…e notes. Also added subheadings to the section of the release notes on backwards incompatible changes.

WarrenWeckesser · 2016-06-19T14:13:27Z

Looks like levy_stable._rvs() wasn't updated along with the rest of the distributions when the _random_state attribute was added. In scipy 0.17.1:

In [4]: stats.levy_stable.rvs(alpha=1, beta=0, size=4, random_state=np.random.RandomState(1234))
Out[4]: array([ 1.84349104, -2.86730347, -3.2989345 ,  0.44281769])

In [5]: stats.levy_stable.rvs(alpha=1, beta=0, size=4, random_state=np.random.RandomState(1234))
Out[5]: array([-0.07054039,  1.90011002, -3.46018551, -4.57797445])

Those outputs should be the same.

I'll fix that now.

ev-br · 2016-06-19T15:25:24Z

Ouch. levy_stable is skipped in the loop over all distributions which, among other things, calls check_random_state_property, so it was not noticed.
This actually was my mess, thank you for cleaning it up!

WarrenWeckesser · 2016-06-19T16:03:03Z

Updated with fix for levy_stable.

ev-br · 2016-06-19T19:19:52Z

LGTM, merging. Thanks Warren!

WarrenWeckesser added the scipy.stats label Nov 23, 2015

WarrenWeckesser mentioned this pull request Nov 23, 2015

stats broadcasting in rvs (Trac #1544) #2069

Closed

WarrenWeckesser force-pushed the stats-rvs-broadcasting branch from fb488bb to 4a5bd6a Compare November 24, 2015 01:22

rgommers added the maintenance Items related to regular maintenance tasks label Nov 24, 2015

WarrenWeckesser force-pushed the stats-rvs-broadcasting branch from 4a5bd6a to 435d658 Compare November 27, 2015 05:40

ev-br reviewed Nov 28, 2015
View reviewed changes

WarrenWeckesser force-pushed the stats-rvs-broadcasting branch from 435d658 to d56fe59 Compare November 30, 2015 20:31

WarrenWeckesser mentioned this pull request Dec 4, 2015

MAINT: stats/tests: A bit of clean up. #5583

Merged

WarrenWeckesser mentioned this pull request Jan 4, 2016

The arguments of randint() and random_integers() do not broadcast. numpy/numpy#6745

Closed

pv added the needs-work Items that are pending response from the author label Feb 16, 2016

WarrenWeckesser force-pushed the stats-rvs-broadcasting branch from d56fe59 to beb7a7a Compare May 18, 2016 18:14

WarrenWeckesser force-pushed the stats-rvs-broadcasting branch 2 times, most recently from e7bd655 to e7a41ca Compare June 17, 2016 23:09

WarrenWeckesser changed the title ~~WIP: Fix handling of size and broadcasting in the rvs method.~~ BUG: stats: Fix broadcasting in the rvs() method of the distributions. Jun 17, 2016

WarrenWeckesser force-pushed the stats-rvs-broadcasting branch from e7a41ca to ffed817 Compare June 18, 2016 00:56

WarrenWeckesser added this to the 0.18.0 milestone Jun 18, 2016

rgommers removed the needs-work Items that are pending response from the author label Jun 18, 2016

WarrenWeckesser added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Jun 19, 2016

WarrenWeckesser added 2 commits June 19, 2016 09:54

DOC: Note the fixed rvs() method of stats distributions in the releas…

8d00608

…e notes. Also added subheadings to the section of the release notes on backwards incompatible changes.

BUG: stats: Use the _random_state attribute in levy_stable.

761ac90

WarrenWeckesser force-pushed the stats-rvs-broadcasting branch from ac350eb to 761ac90 Compare June 19, 2016 14:49

ev-br merged commit ebc92f1 into scipy:master Jun 19, 2016

WarrenWeckesser deleted the stats-rvs-broadcasting branch June 19, 2016 19:25

pvanmulbregt mentioned this pull request Jan 12, 2020

stats.truncnorm.rvs 100x slower in v1.4.x than v1.3.3 #11299

Closed

WarrenWeckesser mentioned this pull request Jan 27, 2020

MAINT: Remove self._size (and self._random_state) from stats distributions. #11394

Merged

Uh oh!

BUG: stats: Fix broadcasting in the rvs() method of the distributions. #5526

BUG: stats: Fix broadcasting in the rvs() method of the distributions. #5526

Uh oh!

Conversation

WarrenWeckesser commented Nov 23, 2015

Uh oh!

WarrenWeckesser commented Nov 23, 2015

Uh oh!

rgommers commented Nov 24, 2015

Uh oh!

rgommers commented Nov 24, 2015

Uh oh!

WarrenWeckesser commented Nov 24, 2015

Uh oh!

WarrenWeckesser commented Nov 27, 2015

Uh oh!

ev-br commented Nov 27, 2015

Uh oh!

WarrenWeckesser commented Nov 27, 2015

Uh oh!

WarrenWeckesser commented Nov 28, 2015

Uh oh!

WarrenWeckesser commented Nov 28, 2015

Uh oh!

ev-br commented Nov 28, 2015

Uh oh!

ev-br commented Nov 28, 2015

Uh oh!

ev-br Nov 28, 2015

Choose a reason for hiding this comment

Uh oh!

WarrenWeckesser Nov 29, 2015

Choose a reason for hiding this comment

Uh oh!

WarrenWeckesser commented Nov 30, 2015

Uh oh!

WarrenWeckesser commented Nov 30, 2015

Uh oh!

rgommers commented Nov 30, 2015

Uh oh!

WarrenWeckesser commented Nov 30, 2015

Uh oh!

pwaller commented Mar 3, 2016

Uh oh!

WarrenWeckesser commented Mar 4, 2016

Uh oh!

pwaller commented Mar 4, 2016

Uh oh!

WarrenWeckesser commented Jun 3, 2016

Uh oh!

larsoner commented Jun 3, 2016

Uh oh!

WarrenWeckesser commented Jun 3, 2016

Uh oh!

WarrenWeckesser commented Jun 18, 2016

Uh oh!

ev-br commented Jun 18, 2016

Uh oh!

WarrenWeckesser commented Jun 18, 2016

Uh oh!

ev-br commented Jun 18, 2016

Uh oh!

ev-br commented Jun 19, 2016

Uh oh!

WarrenWeckesser commented Jun 19, 2016

Uh oh!

ev-br commented Jun 19, 2016

Uh oh!

WarrenWeckesser commented Jun 19, 2016

Uh oh!

ev-br commented Jun 19, 2016

Uh oh!

Uh oh!