-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Open
Labels
Description
Maintaining below a list of TODO tasks and follow-ups compile after the recently added submodule scipy.datasets
.
- Add a public
download_all
utility method to fetch and cache all the datasets at once. (PR ENH: Adddownload_all
utility method & script #17163) - Add a
clear_cache
function which automatically cleansscipy.datasets
cache directory based on the users' platform. This will help avoid the tedious manual way for a user to empty cache if required. (PR ENH: Add clear_cache utility forscipy.datasets
#17478) - When running the complete SciPy test suite offline, we should intelligently skip tests given there is no network connection. Also when
pooch
is not installed, all dataset tests can be skipped. Maybe they also need anXSLOW
marker. BLD/TST: datasets: Allow disabling network requiring tests #17965 - Move other dataset files (eg.
scipy.stats
has its own test datasets like nist within the repo) to their respective new repos (like https://github.com/scipy/dataset-ascent) and utilize the functionality fromscipy.datasets
andpooch
to reduce the wheel size by moving these datasets toscipy.datasets
. Since the data files are only used in testing, it will NOT introduce any new dependency (like pooch) on the stats module. - As mentioned in the comment about the restrictions and a packager needing to adhere to Debian rules, add a separate script or package (Eg:
pip install scipy-datasets
) to download everything at once before SciPy is built or tested. (Handled in PR ENH: Adddownload_all
utility method & script #17163) - Remove filterwarnings for pooch import after new
cerifi
release.certifi
had issues withpy3.11
which are now fixed in master. Pending release of Fix deprecation warning on Python 3.11 certifi/python-certifi#199. (PR MAINT: remove certifi py3.11 warning filter #17149) - Add more datasets to
scipy.datasets
submodule:- Iris Dataset
- Manua Loa CO2 Dataset (https://gml.noaa.gov/ccgg/trends/data.html)
- Discuss on the mailing list about other datasets that could be useful in SciPy apart from the ones that will be moved (See 4th point).
I'll start working on the tasks above, feel free to pick up one of the tasks if it interests you (comment below before you start working).
cc @rgommers
rgommers