Skip to content

Conversation

mdhaber
Copy link
Contributor

@mdhaber mdhaber commented Jun 27, 2024

Reference issue

Toward gh-20544

What does this implement/fix?

Adds array API support to zmap, zscore, and gzscore.

@mdhaber mdhaber added scipy.stats enhancement A new feature or improvement array types Items related to array API support and input array validation (see gh-18286) labels Jun 27, 2024
@mdhaber
Copy link
Contributor Author

mdhaber commented Jun 28, 2024

Array API strict failures seem to be unrelated (gh-21073).


def _convert_common_float(*arrays, xp=None):
xp = array_namespace(*arrays) if xp is None else xp
arrays = [xp.asarray(array) for array in arrays]
Copy link
Member

@j-bowhay j-bowhay Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might need to be

Suggested change
arrays = [xp.asarray(array) for array in arrays]
arrays = [_asarray(array, subok=True) for array in arrays]

otherwise line 3158 will break subclasses for gzscore.

Also with this change _zmap_array_api will preserve subclasses so could we only keep the new shorter implementation as it seems much cleaner and by the test comment seems to fix a bug or is it backwards incompatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm maybe. I'll investigate. But in the meantime I've followed the approach of zmap and just branched here depending on SCIPY_ARRAY_API to avoid any possible unintended interactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j-bowhay the second to last commit removes the old implementation. Besides using _asarrray, I needed to replace a use of xp.where, which doesn't preserve array subclasses.

Note that _xp_mean and _xp_var "just worked" as far as the existing unit tests are concerned, but note that they weren't written with masked arrays in mind, and the unit tests are very weak. I don't have a preference if you're willing to test/review more carefully to check that removing the old implementation is safe (including for other subclasses, like matrix), but personally, I'd just as soon keep both implementations and remove the old one when we no longer have to support subclasses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks I will look more closely over the next couple of days

z = stats.gzscore(mx)
desired = ([-1.526072095151, -0.194700599824, np.inf, 0.584101799472,
1.136670895503])
assert_allclose(desired, z)
Copy link
Contributor Author

@mdhaber mdhaber Jun 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test had the actual and desired values mixed up, and it was checking for an inf where the element should be masked. (I can't see why inf would be the right answer anyway - probably was just a fill_value that got put in during the calculation?) I corrected the test to consider the fact that the output should be a masked array.

Copy link
Member

@j-bowhay j-bowhay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having played with this a bit I think the best option might be to revert to 396e12c as the like_zscore branch in _zmap doesn't not preserve subclasses, presumably due to xp.where not preserving subclasses.

In [1]: import numpy as np

In [2]: class ExampleTensor(np.ndarray):
   ...:     def __new__(cls, input_array):
   ...:         return np.asarray(input_array).view(cls)
   ...:
   ...:     def __array_finalize__(self, obj) -> None:
   ...:         if obj is None: return
   ...:         # This attribute should be maintained!
   ...:         default_attributes = {"attr": 1}
   ...:         self.__dict__.update(default_attributes)  # another way to set attributes
   ...:

In [3]: a = ExampleTensor([0.5, 2.0, 2.5, 3])

In [4]: from scipy.stats import zmap

In [5]: zmap(a, a)
Out[5]: array([-1.60356745,  0.        ,  0.53452248,  1.06904497])

In [6]: b = ExampleTensor([0, 1, 2, 3, 4])

In [7]: zmap(a, b)
Out[7]: ExampleTensor([-1.06066017,  0.        ,  0.35355339,  0.70710678])

Sorry for the wild goose chase, I am just keen to keep the implementation as clean as possible where reasonable to do so.

@mdhaber
Copy link
Contributor Author

mdhaber commented Jul 7, 2024

Well, I can probably try using item assignment there instead? JAX is already skipped sometimes.

@j-bowhay
Copy link
Member

j-bowhay commented Jul 7, 2024

Well, I can probably try using item assignment there instead? JAX is already skipped sometimes.

Yes I think that would also work

@mdhaber
Copy link
Contributor Author

mdhaber commented Jul 7, 2024

OK, tried that so you can test more thoroughly. Also replaced the use of NumPy to work with axis. Ternary version provided as a comment ; )

Please comment on the differences the things the old code took extra care on, such as warnings (or lack thereof) and results when elements of a slice are not unique.

Copy link
Member

@j-bowhay j-bowhay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried zmap with some examples with nans and nan_policy="omit" and saw no issues. Also tried an a which was constant along axis and that worked fine so I think this is ok to proceed.

I assume we can now just move the contents of _zmap into zmap?

@mdhaber
Copy link
Contributor Author

mdhaber commented Jul 18, 2024

@melissawm any guess on whether the PosixPath('/home/circleci/repo/doc/source/_contents/hypothesis_bartlett.ipynb') are the same file error could be related to this PR? I'm not seeing it elsewhere, but I don't see how that could be affected by this PR. I just merged main, so I don't think it's related to gh-21195.

reading sources... [ 99%] tutorial/stats/discrete_planck
reading sources... [ 99%] tutorial/stats/discrete_poisson
reading sources... [ 99%] tutorial/stats/discrete_randint
reading sources... [ 99%] tutorial/stats/discrete_yulesimon
reading sources... [ 99%] tutorial/stats/discrete_zipf
reading sources... [ 99%] tutorial/stats/discrete_zipfian
reading sources... [ 99%] tutorial/stats/hypothesis_bartlett
/home/circleci/repo/doc/source/tutorial/stats/hypothesis_bartlett.md: Executing notebook using local CWD [mystnb]
/home/circleci/repo/doc/source/tutorial/stats/hypothesis_bartlett.md: Executed notebook in 9.00 seconds [mystnb]

Traceback (most recent call last):
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/sphinx/cmd/build.py", line 337, in build_main
    app.build(args.force_all, args.filenames)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/sphinx/application.py", line 378, in build
    self.builder.build_update()
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 297, in build_update
    self.build(to_build,
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 318, in build
    updated_docnames = set(self.read())
                           ^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 425, in read
    self._read_serial(docnames)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 477, in _read_serial
    self.read_doc(docname)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/sphinx/builders/__init__.py", line 536, in read_doc
    publisher.publish()
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/docutils/core.py", line 234, in publish
    self.document = self.reader.read(self.source, self.parser,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/sphinx/io.py", line 106, in read
    self.parse()
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/docutils/readers/__init__.py", line 76, in parse
    self.parser.parse(self.input, document)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/myst_nb/sphinx_.py", line 157, in parse
    mdit_renderer.render(mdit_tokens, mdit_parser.options, mdit_env)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/myst_parser/mdit_to_docutils/base.py", line 243, in render
    self._render_tokens(list(tokens))
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/myst_parser/mdit_to_docutils/base.py", line 222, in _render_tokens
    self.rules[f"render_{child.type}"](child)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/myst_nb/core/render.py", line 117, in render_nb_cell_markdown
    self.render_children(token)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/myst_parser/mdit_to_docutils/base.py", line 396, in render_children
    self.rules[f"render_{child.type}"](child)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/myst_parser/mdit_to_docutils/base.py", line 752, in render_fence
    return self.render_restructuredtext(token)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/myst_parser/mdit_to_docutils/base.py", line 1680, in render_restructuredtext
    MockRSTParser().parse(pseudosource, newdoc)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/myst_parser/mocking.py", line 552, in parse
    super().parse(inputstring, document)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/docutils/parsers/rst/__init__.py", line 184, in parse
    self.statemachine.run(inputlines, document, inliner=self.inliner)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/docutils/parsers/rst/states.py", line 169, in run
    results = StateMachineWS.run(self, input_lines, input_offset,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/docutils/statemachine.py", line 233, in run
    context, next_state, result = self.check_line(
                                  ^^^^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/docutils/statemachine.py", line 445, in check_line
    return method(match, context, next_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/docutils/parsers/rst/states.py", line 2357, in explicit_markup
    nodelist, blank_finish = self.explicit_construct(match)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/docutils/parsers/rst/states.py", line 2369, in explicit_construct
    return method(self, expmatch)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/docutils/parsers/rst/states.py", line 2106, in directive
    return self.run_directive(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/docutils/parsers/rst/states.py", line 2156, in run_directive
    result = directive_instance.run()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/site-packages/jupyterlite_sphinx/jupyterlite_sphinx.py", line 368, in run
    shutil.copy(notebook, notebooks_dir)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/shutil.py", line 431, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/shutil.py", line 236, in copyfile
    raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
shutil.SameFileError: '/home/circleci/repo/doc/source/_contents/hypothesis_bartlett.ipynb' and PosixPath('/home/circleci/repo/doc/source/_contents/hypothesis_bartlett.ipynb') are the same file

Exception occurred:
  File "/home/circleci/.pyenv/versions/3.11.9/lib/python3.11/shutil.py", line 236, in copyfile
    raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
shutil.SameFileError: '/home/circleci/repo/doc/source/_contents/hypothesis_bartlett.ipynb' and PosixPath('/home/circleci/repo/doc/source/_contents/hypothesis_bartlett.ipynb') are the same file
The full traceback has been saved in /tmp/sphinx-err-ndbm02yd.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
make: *** [Makefile:114: html-build] Error 2
Skipping build

Exited with code exit status 1

@j-bowhay
Copy link
Member

The same failure showed up in #21209 so looks unrelated

@melissawm
Copy link
Member

I believe this has to do with a new release of jupyterlite-sphinx. @steppi is looking into it, thanks for the ping!

@melissawm
Copy link
Member

Docs CI should be functional again, cheers!

Copy link
Member

@j-bowhay j-bowhay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mdhaber

@j-bowhay j-bowhay merged commit d60b48d into scipy:main Jul 19, 2024
@j-bowhay j-bowhay added this to the 1.15.0 milestone Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants