-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
DOC: remove references to RandomState #13778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Well it's not wrong but just the old and not optimal way of doing it. Including the majority of our test suite, there is a wild amount of |
Using On the test side, yes we have to keep it as this is the only way, for now, which is safely providing a deterministic stream of bits accros platforms and versions. Also noted in NumPy's doc. IMO, we should only show good practices. From my experience, practitioners take our doc for a gold standard and in that regards, we have to be careful about what we show in examples. A lot of people just copy paste code and don't touch anything if it works. So if in examples we showcase something using a bad, or even just old, way it is not good. As for taking a common seed and using a global seed. It is definitely an issue. |
I agree with all but we have to be careful by swinging from one side to the other. A gentler approach will go a long way for convincing users. Not changing every version to another conclusion;"1.17 was wrong 1.20 was correct now also wrong 1.34 is definitely correct now!!!!1!!" is not emitting the right message here. Showing the shortcomings and letting them convince themselves is a more effective strategy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the examples to use default_rng()
instead of np.random
or RandomState
are very welcome. Some of the docstring changes for random_state
arguments need to be reverted to accurately describe the behavior of check_random_state()
.
doc/source/tutorial/stats.rst
Outdated
And fixing the seed can be done like this: | ||
|
||
>>> from numpy.random import SeedSequence | ||
>>> seed = SeedSequence().entropy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example doesn't fix the seed. Rerunning the whole example gets fresh entropy every time. Instead, I would put in an explicit number literal, like the old example. However, to improve on the old example, use a quality 128-bit number and in the .. warning::
below show how you can use SeedSequence().entropy
to get such a number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks. I was really how to deal with this one. I should have said here that you could get a number by running the code and then use it as you pleased.
Ideally I did not want to put any number so that users would not copy past the same number all the time. I also thought about generating a fix number which would change every time you reload the page.
scipy/sparse/construct.py
Outdated
|
||
>>> class CustomRandomState(np.random.RandomState): | ||
>>> class CustomRandomState(np.random.Generator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How necessary is this subclassing to the example? It's not commented on, or described as an intended API elsewhere in the docstring. Changing the semantics of a method like this is not a practice that I would recommend. If one needs to be able to specify to this method to sample even integers, as in this example, that needs to be specified by some other means.
Also, just swapping Generator
for RandomState
won't work in this example. This function uses the rng_integers()
compatibility shim to draw integers. If random_state
is a Generator
, it will call the integers()
method directly instead of randint()
, so this overridden method gets ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I don't think this class is needed in the example. I will remove it.
Thanks @rkern for the review. As you see I wanted to get rid of all The idea here was to do the same everywhere. Keep this hidden behaviour while encouraging user to use RandomState. If we keep RandomState as options in the arguments, the linters will be fine. Which is not what I wanted. |
I would prefer an accurate description of the |
I'm in agreement with @rkern here. I'd prefer the docstring for the parameters to reflect what's actually accepted by the function. For the examples, changing them all to use Half of me does wonder if a phased deprecation away from RandomState would be worthwhile, but that would need to be part of a wider conversation. It would make internal code easier. |
Omitting the But we're not using that version of |
I see now that |
Well, funny you should ask this question, there is functionality for this here. This was introduced in #11680. It might be better to write a wrapper:
|
Thanks @andyfaff! I think I'll be ok with the function as-is. Do you happen to know if this was an oversight (i.e. that people might want to use |
OK then I will see to add back @andyfaff I am really in favour of planning a deprecation cycle for Also, everyone, feel free to commit directly on the branch if you see something. |
@tupui Can you point us to the TLDR of why it matters? I wonder if I'm the only maintainer who doesn't know. I'm not arguing; I believe that it's the "right" thing to do now, but I haven't come across any answers that resonate with me as to why it matters in everyday code. The documentation says:
but this doesn't actually indicate the difference is or why it's good; it only states what |
TBH, in terms of statistical properties of the RNG, I don't know as I did not dive into it. But on the practical side, there are new things like support of But @rkern would answer better here as he wrote NEP19 https://numpy.org/neps/nep-0019-rng-policy.html |
Thanks, I'll read that. +10 for |
The NEP is a nice read. That definitely suggests to me the worth the deprecation cycle to remove the technical debt. |
If we would go with this, I could also change the PR toward deprecating it. Let me all know 😄 |
Please don't include it here. I don't think we want to do that deprecation - that's a large amount of churn for users, for limited benefits for SciPy itself. But if we want to do it, it's best to discuss on a separate issue/PR. |
There are a couple of things that might people think might be meant by "deprecating it". The one I think some are arguing against is to noisily warn about and eventually reject However, I do think at some point we will want to change the behavior of the general |
But yeah, not in this branch. The example cleanups (and mentioning |
@rkern can you just confirm that I can use this docstring for seed using the current """
seed : {None, int, `numpy.random.Generator`,
`numpy.random.RandomState`}, optional
If `seed` is None (or `np.random`), the `numpy.random.RandomState`
singleton is used.
If `seed` is an int, a new ``RandomState`` instance is used,
seeded with `seed`.
If `seed` is already a ``Generator`` or ``RandomState`` instance then
that instance is used.
""" This is just a slightly modified version of the one from QMC. I would use this also to update the current docstrings of I would prefer not to go through everything 10 times 😅 [EDIT: I started like this, hope its fine... 🤞 ] |
One possible enhancement (in a separate PR!) could be to make |
I can do this afterwards sure 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more very minor suggestions to consider.
Co-authored-by: Robert Kern <robert.kern@gmail.com>
Thank you for the review @rkern! Shall we merge then? |
The |
@tupui This final form LGTM. |
Thanks. I propose to merge this tomorrow then. |
@rkern thanks. I hadn't noticed the API difference yet. |
NumPy documentation says that the default generator can change in future version. “The Generator is the user-facing object that is nearly identical to the legacy RandomState. It accepts a bit generator instance as an argument. The default is currently PCG64 but this may change in future versions.” https://numpy.org/doc/stable/reference/random/index.html This “canonical” way of producing random variates can therefore break the test suite and the doctests. Generator and seed should be explicitly specified. |
Robert already replied in the PR. On my side, I stand by this change. It would be totally up to me that there would be absolutely no seeding. It can give you a fake sense of things working although they are not. Anyway, here the examples are user facing and yes NEP19 is explicitly advocating for using what we proposed here. It's a very little prize to pay to fix a few numbers in the unlikely even that the generator changes. FYI, it took me at most 1h to replace them all once I had the logic in place. So not really any discussions to have here. Furthermore, the tests are still using Good hygiene around RNG is paramount and as a pilar of the statistical environment in Python, SciPy must be exemplary. |
Status
Since NumPy 1.17, the canonical way to create random numbers is:
And If a seed is to be used. User should not use a common value such as 0 or 123456. This is to prevent bias (everyone is using the same values) and ensure sufficient entropy for the generator. More details. The good practice (from NumPy's doc) is:
But in our documentation, we present the old way of dealing with RNG and show bad things like global states of RNG.
Proposal
I propose here to remove from the documentation any reference to
RandomState
and wrong (not best practice) usage of random number generators. It implies changing docstringsrandom_state,seed
and using correct RNG in examples and tutorial. The main user is stats and the tutorial has been updated with a dedicated section for random number generation.For reference, a discussion we recently had about RNG and seeding here: #13631 (comment)