-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
When we add new .py
files, we are careful to add underscores to the file names to ensure everyone sees at a glance that this is not a new public namespace. However, we have lots of files that are missing that underscore. This is only because of historical reasons; in the early days of SciPy no one paid attention to this. We had a discussion about it once, probably about 9-10 years ago, and decided to not clean things up back then because the concrete benefits were unclear.
The public namespaces are documented in http://scipy.github.io/devdocs/reference/index.html#api-definition. No one reads the docs though. This has become much more of a problem recently, because authors of other libraries have started to reimplement parts of the SciPy API, for example:
- CuPy, see https://docs.cupy.dev/en/stable/reference/scipy.html
- JAX, see https://jax.readthedocs.io/en/latest/jax.scipy.html
dask_image
, see http://image.dask.org/en/latest/api.html- PyTorch is adding a
torch.special
namespace that mirrorsscipy.special
: https://pytorch.org/docs/master/special.html
Those libraries can get it wrong by accident, or they just mirror what our actual filenames are even though we tell them it's not meant to be a public namespace. An example of each:
- Leveraging NEP 18 for image processing and signal processing #10204 discusses
dask_image
restructuring its namespace to matchscipy.ndimage
- Rename cupyx.scipy.stats.distributions to _distributions cupy/cupy#5047 is about
scipy.stats.distributions
Long story short: we should clean this up. There's two ways we can go about this:
- don't deprecate anything. Just move
somefile.py
to_somefile.py
, and then add a newsomefile.py
which reimports the public functions from_somefile.py
and adds a big comment with warnings about this not being a public namespace at the top of the file. - do (1), and then deprecate the functions in
somefile
I'd suggest to do (2) by default, and only make exceptions in case something is quite heavily used and adding deprecations would be too disruptive.
We should prioritize modules that are being duplicated in CuPy, JAX, PyTorch et al.
EDIT: a test should also be added that no new private-but-public-looking namespaces are added. The approach can be taken from https://github.com/numpy/numpy/blob/main/numpy/tests/test_public_api.py
EDIT 2: a basic API design principle is: a public object should only be available from one namespace. Having any function in two or more places is just extra technical debt, and with things like dispatching on an API or another library implementing a mirror API, the cost goes up.
>>> from scipy import ndimage
>>> ndimage.filters.gaussian_filter is ndimage.gaussian_filter # :(
True
EDIT 3: now that it's decided we're going ahead with this, here is a tracker of which modules are done:
- cluster
- constants
- fft
- fftpack
- integrate
- interpolate
- io
- linalg
- misc
- ndimage
- odr
- optimize
- signal
- sparse
- sparse.csgraph
- sparse.linalg
- spatial
- special
- stats
Also, we should take over the public API test from NumPy:
EDIT 4:
- right before closing this issue, we should compare master with 1.7.0 and use some version of
sorted([x for x in dir(fitpack) if not (x.startswith('_') or x in ('np', 'warnings'))]
to compare for each file/namespace we gave an underscore that everything that looks like a regular SciPy function/object is present in the listings from which we raise deprecation warnings. That way, even if people used clearly private things that were not in__all__
in that file, they will still get the warnings rather than a hard break.