DOC: remove seed in examples #13863

tupui · 2021-04-14T10:51:21Z

This is a follow up of #13778

Some example were not updated correctly to remove the seeding.

tupui · 2021-04-14T12:42:03Z

Already fixing the issues. Find from PyCharm is having trouble as the repo is huge, so it's not displaying everything all the time...

[EDIT:] I think it's ok now. I search using the find window which appeared to be searching everywhere this time. Also CI is green so ready for review 😄

doc/source/tutorial/linalg.rst

doc/source/tutorial/stats/continuous_kstwo.rst

scipy/linalg/_expm_frechet.py

scipy/special/add_newdocs.py

tirthasheshpatel · 2021-04-15T16:18:06Z

Are these all? I remember seeing a lot of docs using seed in stats.

ilayn · 2021-04-15T16:18:43Z

Wait. It's fine to remove the seed if we have a better way to fix reproducible results. I'm -1 on leaving things dangling randomly regardless of how simple they are. If we are going to touch 50 files at least we should make sure that we do it once.

tirthasheshpatel · 2021-04-15T16:20:06Z

Wait. It's fine to remove the seed if we have a better way to fix reproducible results. I'm -1 on leaving things dangling randomly.

I had this thought too but after #13778 got merged, I thought this is the direction we are headed to. np.random.default_rng allows setting the seed though: rng = np.random.default_rng(12345). What do you think about doing it @tupui?

ilayn · 2021-04-15T16:22:27Z

I thought that was just the randomstate instances. I missed any seed removal on that PR then.

tupui · 2021-04-15T16:46:29Z

Wait. It's fine to remove the seed if we have a better way to fix reproducible results. I'm -1 on leaving things dangling randomly.

I had this thought too but after #13778 got merged, I thought this is the direction we are headed to. np.random.default_rng allows setting the seed though: rng = np.random.default_rng(12345). What do you think about doing it @tupui?

From the discussion in the other PR we actually tried to not have any explicit seeding like 12345. As noted in NumPy's doc and now in the stats' tutorial of our doc, doing this initialize poorly the generator. Also, everyone uses common values like 0, 12345. Hence it leads to bias. Some people just copy-paste code (seen in production or research code all the time, regardless of the language/tech...). So how we write our examples is paramount.

tupui · 2021-04-15T16:49:41Z

Are these all? I remember seeing a lot of docs using seed in stats.

I think. I did several pass of search trying to catch everything.

tirthasheshpatel · 2021-04-15T16:54:27Z

I think. I did several pass of search trying to catch everything.

Thanks! Sorry for the # random to # may vary tedious change. I don't have any major comments other than that for now.

tupui · 2021-04-15T16:57:19Z

I think. I did several pass of search trying to catch everything.

Thanks! Sorry for the # random to # may vary tedious change. I don't have any major comments other than that for now.

No worries. It's easier to change than it was to add ;) I will just wait to have to go from another maintainer to make sure we are all on the same page.

ilayn · 2021-04-15T17:09:00Z

I get the idea in production code. But as we discussed there I don't care about the bias in the examples. If we cannot set seeds or whatever the new tech calls it, I'm against for any change since that is a big regression.

ilayn · 2021-04-15T17:12:57Z

In fact if there is no simple alternative to numpy.random.seed() that is an unfortunate change for all numpy/scipy. Besides, doctests are very important for catching regressions so this PR effectively removes test which is probably not what we want in the first place.

How we write tests are indeed paramount, but not checking what are tests are doing is not acceptable either.

peterbell10 · 2021-04-15T17:16:31Z

doc/source/tutorial/arpack.rst

@@ -94,10 +94,10 @@ simplicity, we'll construct a symmetric, positive-definite matrix.
    >>> from scipy.linalg import eig, eigh
    >>> from scipy.sparse.linalg import eigs, eigsh
    >>> np.set_printoptions(suppress=True)
+    >>> rng = np.random.default_rng()


Can we get reproducible results by seeding the default_rng?

Yes but see #13863 (comment).

And I just found back #13631 (comment).

I don't buy the idea that someone would copy code with random inputs. Surely they would use their own inputs if they're actually trying to achieve something other than learning the scipy function (in which case random quality is irrelevant).

However if it's really an issue, you could always use default_rng(seed=SEED) where SEED is defined in the doctest runner, so not part of the example.

scipy/tools/refguide_check.py

Lines 468 to 469 in 374e5bf

# the namespace to run examples in

DEFAULT_NAMESPACE = {'np': np}

However if it's really an issue, you could always use default_rng(seed=SEED) where SEED is defined in the doctest runner, so not part of the example.

Hmm, that would be a pleasing middle ground.

Ideally I think if you ran make clean; make html twice you'd get the same output, and with this PR you won't (better for reproducibility, smaller diffs with doc uploads to version controlled things, etc.). At the same time, I see the point about wanting to set a good example (which I agree is more important).

Another option is for for us to hack the doc building in some way to give us reproducible outputs. For example, in conf.py we could temporarily monkeypatch numpy.random.default_rng to be lambda seed=0: original_default_rng(seed). That way our examples stay static, but readers just see us use rng = numpy.random.default_rng() in examples. I can look into this and open a PR to do this separately if people agree it solves our problems.

Ark just seeing this. I started to add the SEED 😅. Well it's not for nothing if we take your approach @larsoner as I basically would have the new values for the tests... If we do this, you could commit to my branch directly. Or first we would use # random and then introduce this hack?

I will wait a bit for the decision then (+1 for @rgommers with the # random. In the end, I believe it could mean less technical debt as we would not need to maintain changing values there. Plus a new hack.).

Just let me know 😃

If we do this, you could commit to my branch directly.

This seems like the easiest approach. Let's wait to see if others agree it's the best compromise.

Also I do agree with @rgommers that we will cause copy-paste problems for people with the SEED approach, so I prefer the hack-the-doc-build approach.

+1 for hack-the-doc-build, good idea

ok great 😅 @larsoner I will push very soon what I did with SEED. Basically I added this in the refguide_check.py

# SEED is generated with # from numpy.random import SeedSequence # print(SeedSequence().entropy) DEFAULT_NAMESPACE = {'np': np, 'SEED': 1638083107694713882823079058616272161}

Then if I understand correctly your trick, please use this value when replacing >>> rng = np.random.default_rng().

peterbell10

Spotted a couple more

doc/source/tutorial/stats/plots/mgc_plot2.py

doc/source/tutorial/stats/plots/mgc_plot4.py

Co-authored-by: peterbell10 <peterbell10@live.co.uk> DOC: more seeding Apply suggestions from code review Co-authored-by: peterbell10 <peterbell10@live.co.uk> DOC: more seeding

tupui · 2021-04-17T13:35:47Z

Thanks @peterbell10. Originally I did not look much as the plots or things which did not cause doctests to fail. I fixed these now. I also went through again np.random.seed/rand/randint/randn/random/uniform.

I had to put a # may vary for multivariate_normal as seeding is not sufficient as described in multiple issues (numpy/numpy#13597, numpy/numpy#8041).

[ps: the removing of reviewer is not intentional, I am on the iOS app and adding a reviewer removed some 🤷‍♂️]

tupui · 2021-04-18T16:27:15Z

@rgommers do you want to have a final look?

rgommers

LGTM too and CI is all green. Let's give this a go, thanks @tupui! And thanks all reviewers!

tupui · 2021-04-18T19:42:42Z

Thanks Ralf 😃 and thanks everyone for the review and the discussion!

mdhaber · 2021-04-19T07:22:48Z

Anyone else on Windows having trouble running refguide_check locally? I'm consistently getting a MemoryError in the middle of the stats refguide_check.
Is there a verbose output option so I can tell which test is failing? python runtests.py -g --refguide-check -v doesn't do it.
Is there a way of running the check for a single file so I can avoid the error? python runtests.py -g --refguide-check -t scipy.stats._entropy doesn't do it.

rgommers · 2021-04-19T07:40:20Z

That doesn't sound familiar. Would be better to open a separate issue for it.

ev-br · 2021-04-19T12:23:04Z

python runtests.py -g --refguide-check -t scipy.stats._entropy doesn't do it.

Sadlly, no; refguide-check only works per-submodule, so -s stats should work but -t scipy.stats._entropy won't.

sturlamolden · 2021-05-10T20:04:17Z

Status

Since NumPy 1.17, the canonical way to create random numbers is:
import numpy as np
from numpy.random import default_rng
rng = default_rng()

NumPy documentation says that the default generator can change in future version.

“The Generator is the user-facing object that is nearly identical to the legacy RandomState. It accepts a bit generator instance as an argument. The default is currently PCG64 but this may change in future versions.”

https://numpy.org/doc/stable/reference/random/index.html

This “canonical” way of producing random variates can therefore break the test suite and the doctests. Generator and seed should be explicitly specified.

rkern · 2021-05-10T20:13:46Z

We've already discussed this in this PR. I don't intend to rehash the whole conversation (you can read it at your leisure), but in brief: We don't use doctests in our test suite to test the functionality of our code. We use explicitly-written unit tests for that purpose. Our examples are written to document how to use scipy, and thus are written according to the current recommended practice. If an upstream change changes the output of the examples meaningfully, we'll just update the examples.

tupui requested review from andyfaff, ilayn, larsoner and peterbell10 as code owners April 14, 2021 10:51

tupui mentioned this pull request Apr 14, 2021

ENH: stats: add method parameter to differential_entropy + more accurate estimators for small samples #13845

Merged

tupui added the Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org label Apr 14, 2021

tupui force-pushed the random_seed_examples branch from b07b3a7 to 951e90d Compare April 14, 2021 13:04

tupui requested review from ev-br, person142 and tylerjereddy as code owners April 14, 2021 13:04

DOC: remove seed in examples

3a6b503

tupui force-pushed the random_seed_examples branch from 951e90d to 3a6b503 Compare April 14, 2021 13:47

tupui added this to the 1.7.0 milestone Apr 14, 2021

tupui mentioned this pull request Apr 15, 2021

ENH: stats: add 'alternative' parameter to ansari #13650

Merged

tirthasheshpatel reviewed Apr 15, 2021

View reviewed changes

doc/source/tutorial/linalg.rst Outdated Show resolved Hide resolved

tirthasheshpatel reviewed Apr 15, 2021

View reviewed changes

doc/source/tutorial/stats/continuous_kstwo.rst Outdated Show resolved Hide resolved

tirthasheshpatel reviewed Apr 15, 2021

View reviewed changes

scipy/linalg/_expm_frechet.py Show resolved Hide resolved

tirthasheshpatel reviewed Apr 15, 2021

View reviewed changes

scipy/special/add_newdocs.py Outdated Show resolved Hide resolved

peterbell10 reviewed Apr 15, 2021

View reviewed changes

peterbell10 reviewed Apr 17, 2021

View reviewed changes

doc/source/tutorial/stats/plots/mgc_plot2.py Outdated Show resolved Hide resolved

doc/source/tutorial/stats/plots/mgc_plot4.py Outdated Show resolved Hide resolved

tupui force-pushed the random_seed_examples branch 2 times, most recently from 2111fca to 3ec3192 Compare April 17, 2021 13:11

Apply suggestions from code review

16e7988

Co-authored-by: peterbell10 <peterbell10@live.co.uk> DOC: more seeding Apply suggestions from code review Co-authored-by: peterbell10 <peterbell10@live.co.uk> DOC: more seeding

tupui force-pushed the random_seed_examples branch from 3ec3192 to 16e7988 Compare April 17, 2021 13:33

tupui requested review from rgommers and peterbell10 April 17, 2021 15:28

peterbell10 approved these changes Apr 18, 2021

View reviewed changes

rgommers approved these changes Apr 18, 2021

View reviewed changes

rgommers merged commit 9492369 into scipy:master Apr 18, 2021

tupui deleted the random_seed_examples branch April 18, 2021 19:43

pllim mentioned this pull request Apr 19, 2021

TST: Do not use random seed in doctest astropy/astropy#11566

Open

mdhaber mentioned this pull request May 10, 2021

ENH: stats: add bootstrap for estimating confidence interval and standard error of an n-sample statistic #13371

Merged

11 tasks

rkern mentioned this pull request Jun 5, 2021

MAINT: stats: Update ppcc_plot and ppcc_max docstring. #14182

Merged

WarrenWeckesser mentioned this pull request Jun 6, 2021

DOC: SciPy extensions for code style and docstring guidelines. #13955

Merged

mdhaber mentioned this pull request Jun 29, 2021

ks_2samp: example description does not match example output #14313

Closed

tupui mentioned this pull request Jun 29, 2021

ENH: refactor RNG usage to use np.random.Generator #14322

Open

mdhaber mentioned this pull request Feb 23, 2022

DOC: stats.ks_1samp: incorrect commentary in examples #15644

Closed

This was referenced Mar 4, 2022

DOC: False information in docs - scipy.stats.ttest_1samp #15696

Closed

DOC: stats.ttest_1samp: update example #15699

Merged

mdhaber mentioned this pull request Nov 18, 2022

DOC: Improve example for uniform_direction distribution #17439

Merged

mdhaber mentioned this pull request Feb 18, 2023

DOC: incorrect example for stats.cramervonmises #17999

Closed

	# the namespace to run examples in
	DEFAULT_NAMESPACE = {'np': np}

Uh oh!

DOC: remove seed in examples #13863

DOC: remove seed in examples #13863

Uh oh!

Conversation

tupui commented Apr 14, 2021

Uh oh!

tupui commented Apr 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tirthasheshpatel commented Apr 15, 2021

Uh oh!

ilayn commented Apr 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tirthasheshpatel commented Apr 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilayn commented Apr 15, 2021

Uh oh!

tupui commented Apr 15, 2021

Uh oh!

tupui commented Apr 15, 2021

Uh oh!

tirthasheshpatel commented Apr 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tupui commented Apr 15, 2021

Uh oh!

ilayn commented Apr 15, 2021

Uh oh!

ilayn commented Apr 15, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tupui Apr 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterbell10 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tupui commented Apr 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tupui commented Apr 18, 2021

Uh oh!

rgommers left a comment

Choose a reason for hiding this comment

Uh oh!

tupui commented Apr 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdhaber commented Apr 19, 2021

Uh oh!

rgommers commented Apr 19, 2021

Uh oh!

ev-br commented Apr 19, 2021

tupui commented Apr 14, 2021 •

edited

Loading

ilayn commented Apr 15, 2021 •

edited

Loading

tirthasheshpatel commented Apr 15, 2021 •

edited

Loading

tirthasheshpatel commented Apr 15, 2021 •

edited

Loading

tupui Apr 16, 2021 •

edited

Loading

tupui commented Apr 17, 2021 •

edited

Loading

tupui commented Apr 18, 2021 •

edited

Loading