ENH: stats: add array API support for `directional_stats` #20794

dschmitz89 · 2024-05-26T05:54:22Z

Reference issue

Towards #20544

What does this implement/fix?

Adds array API support for directional_stats.

Additional information

This is the first time I look into array API stuff, any pointers are highly appreciated. I mostly checked what is available in the standard and adapted existing tests to what I saw for tests of other array API compatible functions. I do not have pytorch locally installed, so mostly relying on CI now.

dschmitz89 · 2024-05-26T06:23:27Z

Looks like atan2 and linalg.vector_norm are only available in numpy 2. Do we need custom patches for all these noncompliant cases?

dschmitz89 · 2024-05-26T07:04:35Z

I thought moveaxis was part of the standard: https://data-apis.org/array-api/2023.12/API_specification/generated/array_api.moveaxis.html#moveaxis

Have to take a deeper look into this arrray-api_strict stuff.

lucascolley · 2024-05-26T07:44:39Z

moveaxis was only added in the most recent version of the spec - array-api-{compat, strict} haven't caught up yet. I think Matt made a private replacement recently somewhere.

EDIT:

scipy/scipy/_lib/_array_api.py

Lines 490 to 501 in b863eb9

    
           # temporary substitute for xp.moveaxis, which is not yet in all backends 
        
           # or covered by array_api_compat. 
        
           def xp_moveaxis_to_end( 
        
                   x: Array, 
        
                   source: int, 
        
                   /, *, 
        
                   xp: ModuleType | None = None) -> Array: 
        
               xp = array_namespace(xp) if xp is None else xp 
        
               axes = list(range(x.ndim)) 
        
               temp = axes.pop(source) 
        
               axes = axes + [temp] 
        
               return xp.permute_dims(x, axes)

scipy/stats/_morestats.py

lucascolley · 2024-05-26T08:04:13Z

For linalg.vector_norm with np<2, I think you can write a helper something like this and put it in _lib._array_api:

# maybe replace by `scipy.linalg` if/when converted
def xp_vector_norm(x, *, ..., xp=None):
    xp = array_namespace(x) if xp is None else xp

	# check for optional `linalg` extension
    if hasattr(xp, 'linalg'):
        return xp.linalg.vector_norm(...)

    else:
        # return (x @ x)**0.5 
        # or to get the right behavior with nd, complex arrays
        return xp.sum(xp.conj(x) * x, axis=axis, keepdims=keepdims)**0.5

scipy/stats/tests/test_morestats.py

mdhaber · 2024-05-26T13:02:40Z

@lucascolley I'm not sure I understand the need to check SCIPY_ARRAY_API in xp_vector_norm? I don't know of another place we special case to use pure (array-api-compat) NumPy in functions depending on SCIPY_ARRAY_API.

Suppose vector_norm were a required part of the standard. Then in the function, even though vector_norm is not in NumPy < 2, we would still just write:

xp = array_namespace(x)
x = xp.asarray(x)  # this is still common because we still accept lists
xp.linalg.vector_norm(x)

Why is it different if linalg is not a required part of the standard?

In any case, I'd suggest implementing the vector norm in terms of standard operations instead of falling back to NumPy:

	    if hasattr(xp, 'linalg'):
	        return xp.linalg.vector_norm(x, axis=axis, keepdims=keepdims)
	    else:
	        # return (x @ x)**0.5 
                # or to get the right behavior with nd, complex arrays
                return xp.sum(xp.conj(x) * x, axis=axis, keepdims=keepdims)**0.5

Even NumPy isn't careful to avoid premature over/underflow, so I don't think we need to be more careful. And if others need different norms, they can add the ord argument to this private function.

lucascolley · 2024-05-26T13:07:46Z

@mdhaber you're right, I've updated my suggestion.

fancidev · 2024-05-29T09:24:47Z

Two small comments:

1/ In the test case test_directional_stats_correctness, since the input values are copied from an external paper, it may be preferable to keep them as is so that one could easily verify that they agree with the source. Without deg2rad you could simply use *2pi/360 to convert them to radians.

2/ Regarding the lack of atan2, if you’re referring to its use in a test case, then you could use math.atan2, as in that test case atan2 happens to operate on scalars.

scipy/stats/tests/test_morestats.py

[skip cirrus] [skip circle]

[skip cirrus] [skip circle] Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

[skip cirrus]

scipy/_lib/_array_api.py

[skip cirrus] [skip circle]

lucascolley · 2024-07-01T12:18:39Z

Some of the tests had ref and res the wrong way around so a little extra refactoring was needed. I just get one error locally now where there is a different error message for torch arrays.

j-bowhay · 2024-07-01T14:56:51Z

scipy/stats/tests/test_morestats.py

+        full_array = xp.asarray(np.tile(data, (2, 2, 2, 1)))
+        expected = xp.asarray([[[1., 0., 0.],
+                                [1., 0., 0.]],
+                               [[1., 0., 0.],
+                                [1., 0., 0.]]],
+                              dtype=xp.float64)


Should we perform this test with the default floating type ie

Suggested change

full_array = xp.asarray(np.tile(data, (2, 2, 2, 1)))

expected = xp.asarray([[[1., 0., 0.],

[1., 0., 0.]],

[[1., 0., 0.],

[1., 0., 0.]]],

dtype=xp.float64)

full_array = xp.asarray(np.tile(data, (2, 2, 2, 1)).tolist())

expected = xp.asarray([[[1., 0., 0.],

[1., 0., 0.]],

[[1., 0., 0.],

[1., 0., 0.]]])

?

I'm happy with either, personally. I think there is value in doing some or most tests with default floating point type, but there is also some value in testing with non-default types. If that happens naturally by convenience in some tests, great.

I'm only forcing the dtype of expected. For default torch input (float32), directional_stats returns float64. Is that okay?

Same thing for the next test function.

Is that because of data-apis/array-api-compat#152? If not, what is converting it? vector_norm?

For default torch input (float32), directional_stats returns float64. Is that okay?

I don't think so. I think it's just NumPy that converts to float64, and that's because of data-apis/array-api-compat#152 (and sum is used). For torch, the dtype is preserved as it should be, yet even if it is float32, repr(f"{res.mean_resultant_length}") prints lots of digits!

from array_api_compat import torch from array_api_compat import numpy as np from scipy import stats rng = np.random.default_rng(2549824598234528) x = np.astype(rng.random((3, 10)), np.float32) y = torch.asarray(x) assert y.dtype == torch.float32 res = stats.directional_stats(y) assert res.mean_direction.dtype == torch.float32 assert res.mean_resultant_length.dtype == torch.float32 print(f"{res.mean_resultant_length}") # 0.941432774066925

I guess that makes sense if Python converts it to a float before printing. I don't think I've noticed that before, though; it's uncommon for the result class to have __repr__ defined explicitly.

I said before that it's OK to let sum do its thing (return np.float64 with np.float32 input), but the problem is that it will change when data-apis/array-api-compat#152 is resolved. Might be better to explicitly provide the dtype to sum so it doesn't change later.

What should we change. then? I assume you are referring to xp.sum being called within our call to xp.mean? We don't call xp.sum directly for PyTorch or NumPy as it uses xp.linalg.

Well, then, not even NumPy converts to float64. You wrote that "For default torch input (float32), directional_stats returns float64. Is that okay?" and I was trying to explain why that might be the case, but it doesn't look like that happens. I guess I didn't actually test for NumPy before, but it doesn't seem to happpen for NumPy either because - as you say - NumPy skips sum, so data-apis/array-api-compat#152 doesn't come up.

I think you only have to force the dtype of expected because data starts out as a float64 array, so Torch's input is a float64 tensor. If we use xp.asarray instead of np.array, it should work without specifying the dtype.

mdhaber

Not too much to say here. Thanks!

scipy/stats/_morestats.py

mdhaber · 2024-07-01T15:11:59Z

scipy/stats/tests/test_morestats.py

+        full_array = xp.asarray(np.tile(data, (2, 2, 2, 1)))
+        expected = xp.asarray([[[1., 0., 0.],
+                                [1., 0., 0.]],
+                               [[1., 0., 0.],
+                                [1., 0., 0.]]],
+                              dtype=xp.float64)


I'm happy with either, personally. I think there is value in doing some or most tests with default floating point type, but there is also some value in testing with non-default types. If that happens naturally by convenience in some tests, great.

scipy/stats/tests/test_morestats.py

[skip cirrus] [skip circle]

lucascolley

please let me know how we can fix the regex failure in CI

lucascolley · 2024-07-01T16:03:02Z

scipy/stats/tests/test_morestats.py

+        full_array = xp.asarray(np.tile(data, (2, 2, 2, 1)))
+        expected = xp.asarray([[[1., 0., 0.],
+                                [1., 0., 0.]],
+                               [[1., 0., 0.],
+                                [1., 0., 0.]]],
+                              dtype=xp.float64)


I'm only forcing the dtype of expected. For default torch input (float32), directional_stats returns float64. Is that okay?

Same thing for the next test function.

scipy/stats/tests/test_morestats.py

scipy/stats/_morestats.py

scipy/stats/tests/test_morestats.py

mdhaber

LGTM after these changes!

[skip cirrus] [skip circle]

j-bowhay · 2024-07-05T13:47:15Z

scipy/_lib/_array_api.py

+    if SCIPY_ARRAY_API and hasattr(xp, 'linalg'):
+        return xp.linalg.vector_norm(x, axis=axis, keepdims=keepdims, ord=ord)
+
+    else:
+        if ord != 2:
+            raise ValueError(
+                "only the Euclidean norm (`ord=2`) is currently supported in "
+                "`xp_vector_norm` for backends not implementing the `linalg` extension."
+            )
+        # return (x @ x)**0.5 
+        # or to get the right behavior with nd, complex arrays
+        return xp.sum(xp.conj(x) * x, axis=axis, keepdims=keepdims)**0.5


I don't quite fully understand this. If SCIPY_ARRAY_API=0 then xp.sum(xp.conj(x) * x, axis=axis, keepdims=keepdims)**0.5 is always used which doesn't seem preferable compared to np.linalg.norm?

Right, I think I should rework the logic to

if SCIPY_ARRAY_API: if hasattr(xp, linalg): ... else: ... else: # use np.linalg.norm

this should be fixed now

[skip cirrus] [skip circle]

lucascolley · 2024-07-06T12:04:51Z

please squash merge!

scipy/stats/tests/test_morestats.py

[skip ci]

j-bowhay

Thanks everyone!

dschmitz89 · 2024-07-06T18:19:43Z

Thanks everyone for pushing this in my absence. :)

Co-authored-by: Lucas Colley <lucas.colley8@gmail.com> Co-authored-by: Matt Haberland <mhaberla@calpoly.edu> Co-authored-by: Jake Bowhay <60778417+j-bowhay@users.noreply.github.com>

dschmitz89 requested a review from lucascolley May 26, 2024 05:54

github-actions bot added scipy.stats enhancement A new feature or improvement labels May 26, 2024

lucascolley reviewed May 26, 2024

View reviewed changes

scipy/stats/_morestats.py Outdated Show resolved Hide resolved

scipy/stats/_morestats.py Outdated Show resolved Hide resolved

lucascolley added the array types Items related to array API support and input array validation (see gh-18286) label May 26, 2024

lucascolley changed the title ~~ENH: array API support for stats.directional_stats~~ ENH: stats: add array API support for directional_stats May 26, 2024

This comment was marked as outdated.

Sign in to view

mdhaber reviewed May 26, 2024

View reviewed changes

scipy/stats/tests/test_morestats.py Outdated Show resolved Hide resolved

lucascolley mentioned this pull request Jun 5, 2024

ENH: stats: add array API-support #20544

Open

mdhaber reviewed Jun 28, 2024

View reviewed changes

scipy/stats/tests/test_morestats.py Outdated Show resolved Hide resolved

mdhaber reviewed Jun 28, 2024

View reviewed changes

scipy/stats/tests/test_morestats.py Outdated Show resolved Hide resolved

mdhaber reviewed Jun 28, 2024

View reviewed changes

scipy/stats/tests/test_morestats.py Outdated Show resolved Hide resolved

dschmitz89 added 4 commits July 1, 2024 11:50

ENH: array API support for stats.directional_stats

5ffd406

STY: beautification

76a57cd

BUG: replace xp.array in favor of xp.asarray

1647f8f

TST: use np.arctan2 in test

ad9ab68

[skip cirrus] [skip circle]

lucascolley force-pushed the directional_stats_array_api branch from 7b2c1b8 to ad9ab68 Compare July 1, 2024 10:51

lucascolley and others added 3 commits July 1, 2024 11:53

Apply suggestions from code review

8970957

[skip cirrus] [skip circle] Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>

add xp_vector_norm, clean up tests

4b22ee2

[skip cirrus]

ensure default dtype is used

e6b1baf

[skip cirrus]

lucascolley added this to the 1.15.0 milestone Jul 1, 2024

lucascolley reviewed Jul 1, 2024

View reviewed changes

scipy/_lib/_array_api.py Outdated Show resolved Hide resolved

lucascolley marked this pull request as ready for review July 1, 2024 11:16

lucascolley reviewed Jul 1, 2024

View reviewed changes

scipy/_lib/_array_api.py Outdated Show resolved Hide resolved

lucascolley reviewed Jul 1, 2024

View reviewed changes

scipy/_lib/_array_api.py Show resolved Hide resolved

fix up tests

566b085

[skip cirrus] [skip circle]

lucascolley requested a review from mdhaber July 1, 2024 12:56

j-bowhay reviewed Jul 1, 2024

View reviewed changes

mdhaber reviewed Jul 1, 2024

View reviewed changes

address review feedback

5551a40

[skip cirrus] [skip circle]

lucascolley reviewed Jul 1, 2024

View reviewed changes

mdhaber reviewed Jul 1, 2024

View reviewed changes

scipy/stats/_morestats.py Outdated Show resolved Hide resolved

mdhaber reviewed Jul 2, 2024

View reviewed changes

scipy/stats/tests/test_morestats.py Outdated Show resolved Hide resolved

mdhaber reviewed Jul 2, 2024

View reviewed changes

scipy/stats/tests/test_morestats.py Outdated Show resolved Hide resolved

mdhaber reviewed Jul 2, 2024

View reviewed changes

scipy/stats/tests/test_morestats.py Show resolved Hide resolved

mdhaber reviewed Jul 2, 2024

View reviewed changes

scipy/stats/tests/test_morestats.py Outdated Show resolved Hide resolved

mdhaber reviewed Jul 2, 2024

View reviewed changes

lucascolley and others added 2 commits July 4, 2024 16:48

changes per review [skip cirrus] [skip circle]

99fe2ae

TST: stats.directional_stats: fix dtype in test

ff74349

[skip cirrus] [skip circle]

mdhaber requested a review from j-bowhay July 5, 2024 06:43

j-bowhay reviewed Jul 5, 2024

View reviewed changes

fix logic when SCIPY_ARRAY_API not set

f2342e4

[skip cirrus] [skip circle]

j-bowhay reviewed Jul 6, 2024

View reviewed changes

scipy/stats/tests/test_morestats.py Outdated Show resolved Hide resolved

Update scipy/stats/tests/test_morestats.py

9fc692a

[skip ci]

j-bowhay approved these changes Jul 6, 2024

View reviewed changes

j-bowhay merged commit 4248696 into scipy:main Jul 6, 2024

j-bowhay mentioned this pull request Jul 6, 2024

MAINT: use xp_vector_norm instead of xp.linalg.vector_norm #21124

Merged

Uh oh!

ENH: stats: add array API support for directional_stats #20794

ENH: stats: add array API support for directional_stats #20794

Uh oh!

Conversation

dschmitz89 commented May 26, 2024

Reference issue

What does this implement/fix?

Additional information

Uh oh!

dschmitz89 commented May 26, 2024

Uh oh!

dschmitz89 commented May 26, 2024

Uh oh!

lucascolley commented May 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucascolley commented May 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

mdhaber commented May 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucascolley commented May 26, 2024

Uh oh!

fancidev commented May 29, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lucascolley commented Jul 1, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucascolley Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mdhaber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucascolley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: stats: add array API support for `directional_stats` #20794

ENH: stats: add array API support for `directional_stats` #20794

lucascolley commented May 26, 2024 •

edited

Loading

lucascolley commented May 26, 2024 •

edited

Loading

mdhaber commented May 26, 2024 •

edited

Loading

mdhaber Jul 1, 2024 •

edited

Loading

mdhaber Jul 1, 2024 •

edited

Loading

lucascolley Jul 2, 2024 •

edited

Loading

mdhaber Jul 2, 2024 •

edited

Loading