Skip to content

Conversation

dschmitz89
Copy link
Contributor

Reference issue

Towards #20544

What does this implement/fix?

Adds array API support for directional_stats.

Additional information

This is the first time I look into array API stuff, any pointers are highly appreciated. I mostly checked what is available in the standard and adapted existing tests to what I saw for tests of other array API compatible functions. I do not have pytorch locally installed, so mostly relying on CI now.

@dschmitz89 dschmitz89 requested a review from lucascolley May 26, 2024 05:54
@github-actions github-actions bot added scipy.stats enhancement A new feature or improvement labels May 26, 2024
@dschmitz89
Copy link
Contributor Author

Looks like atan2 and linalg.vector_norm are only available in numpy 2. Do we need custom patches for all these noncompliant cases?

@dschmitz89
Copy link
Contributor Author

I thought moveaxis was part of the standard: https://data-apis.org/array-api/2023.12/API_specification/generated/array_api.moveaxis.html#moveaxis

Have to take a deeper look into this arrray-api_strict stuff.

@lucascolley
Copy link
Member

lucascolley commented May 26, 2024

moveaxis was only added in the most recent version of the spec - array-api-{compat, strict} haven't caught up yet. I think Matt made a private replacement recently somewhere.

EDIT:

# temporary substitute for xp.moveaxis, which is not yet in all backends
# or covered by array_api_compat.
def xp_moveaxis_to_end(
x: Array,
source: int,
/, *,
xp: ModuleType | None = None) -> Array:
xp = array_namespace(xp) if xp is None else xp
axes = list(range(x.ndim))
temp = axes.pop(source)
axes = axes + [temp]
return xp.permute_dims(x, axes)

@lucascolley
Copy link
Member

lucascolley commented May 26, 2024

For linalg.vector_norm with np<2, I think you can write a helper something like this and put it in _lib._array_api:

# maybe replace by `scipy.linalg` if/when converted
def xp_vector_norm(x, *, ..., xp=None):
    xp = array_namespace(x) if xp is None else xp

	# check for optional `linalg` extension
    if hasattr(xp, 'linalg'):
        return xp.linalg.vector_norm(...)

    else:
        # return (x @ x)**0.5 
        # or to get the right behavior with nd, complex arrays
        return xp.sum(xp.conj(x) * x, axis=axis, keepdims=keepdims)**0.5

@lucascolley lucascolley added the array types Items related to array API support and input array validation (see gh-18286) label May 26, 2024
@lucascolley lucascolley changed the title ENH: array API support for stats.directional_stats ENH: stats: add array API support for directional_stats May 26, 2024
@mdhaber

This comment was marked as outdated.

@mdhaber
Copy link
Contributor

mdhaber commented May 26, 2024

@lucascolley I'm not sure I understand the need to check SCIPY_ARRAY_API in xp_vector_norm? I don't know of another place we special case to use pure (array-api-compat) NumPy in functions depending on SCIPY_ARRAY_API.

Suppose vector_norm were a required part of the standard. Then in the function, even though vector_norm is not in NumPy < 2, we would still just write:

xp = array_namespace(x)
x = xp.asarray(x)  # this is still common because we still accept lists
xp.linalg.vector_norm(x)

Why is it different if linalg is not a required part of the standard?


In any case, I'd suggest implementing the vector norm in terms of standard operations instead of falling back to NumPy:

	    if hasattr(xp, 'linalg'):
	        return xp.linalg.vector_norm(x, axis=axis, keepdims=keepdims)
	    else:
	        # return (x @ x)**0.5 
                # or to get the right behavior with nd, complex arrays
                return xp.sum(xp.conj(x) * x, axis=axis, keepdims=keepdims)**0.5

Even NumPy isn't careful to avoid premature over/underflow, so I don't think we need to be more careful. And if others need different norms, they can add the ord argument to this private function.

@lucascolley
Copy link
Member

@mdhaber you're right, I've updated my suggestion.

@fancidev
Copy link
Contributor

Two small comments:

1/ In the test case test_directional_stats_correctness, since the input values are copied from an external paper, it may be preferable to keep them as is so that one could easily verify that they agree with the source. Without deg2rad you could simply use *2pi/360 to convert them to radians.

2/ Regarding the lack of atan2, if you’re referring to its use in a test case, then you could use math.atan2, as in that test case atan2 happens to operate on scalars.

@lucascolley lucascolley force-pushed the directional_stats_array_api branch from 7b2c1b8 to ad9ab68 Compare July 1, 2024 10:51
lucascolley and others added 3 commits July 1, 2024 11:53
[skip cirrus] [skip circle]

Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
@lucascolley lucascolley added this to the 1.15.0 milestone Jul 1, 2024
@lucascolley lucascolley marked this pull request as ready for review July 1, 2024 11:16
[skip cirrus] [skip circle]
@lucascolley
Copy link
Member

Some of the tests had ref and res the wrong way around so a little extra refactoring was needed. I just get one error locally now where there is a different error message for torch arrays.

@lucascolley lucascolley requested a review from mdhaber July 1, 2024 12:56
Comment on lines 3032 to 3037
full_array = xp.asarray(np.tile(data, (2, 2, 2, 1)))
expected = xp.asarray([[[1., 0., 0.],
[1., 0., 0.]],
[[1., 0., 0.],
[1., 0., 0.]]],
dtype=xp.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perform this test with the default floating type ie

Suggested change
full_array = xp.asarray(np.tile(data, (2, 2, 2, 1)))
expected = xp.asarray([[[1., 0., 0.],
[1., 0., 0.]],
[[1., 0., 0.],
[1., 0., 0.]]],
dtype=xp.float64)
full_array = xp.asarray(np.tile(data, (2, 2, 2, 1)).tolist())
expected = xp.asarray([[[1., 0., 0.],
[1., 0., 0.]],
[[1., 0., 0.],
[1., 0., 0.]]])

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with either, personally. I think there is value in doing some or most tests with default floating point type, but there is also some value in testing with non-default types. If that happens naturally by convenience in some tests, great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only forcing the dtype of expected. For default torch input (float32), directional_stats returns float64. Is that okay?

Same thing for the next test function.

Copy link
Contributor

@mdhaber mdhaber Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that because of data-apis/array-api-compat#152? If not, what is converting it? vector_norm?

Copy link
Contributor

@mdhaber mdhaber Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For default torch input (float32), directional_stats returns float64. Is that okay?

I don't think so. I think it's just NumPy that converts to float64, and that's because of data-apis/array-api-compat#152 (and sum is used). For torch, the dtype is preserved as it should be, yet even if it is float32, repr(f"{res.mean_resultant_length}") prints lots of digits!

from array_api_compat import torch
from array_api_compat import numpy as np
from scipy import stats
rng = np.random.default_rng(2549824598234528)
x = np.astype(rng.random((3, 10)), np.float32)
y = torch.asarray(x)
assert y.dtype == torch.float32
res = stats.directional_stats(y)
assert res.mean_direction.dtype == torch.float32
assert res.mean_resultant_length.dtype == torch.float32
print(f"{res.mean_resultant_length}")
# 0.941432774066925

I guess that makes sense if Python converts it to a float before printing. I don't think I've noticed that before, though; it's uncommon for the result class to have __repr__ defined explicitly.

I said before that it's OK to let sum do its thing (return np.float64 with np.float32 input), but the problem is that it will change when data-apis/array-api-compat#152 is resolved. Might be better to explicitly provide the dtype to sum so it doesn't change later.

Copy link
Member

@lucascolley lucascolley Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we change. then? I assume you are referring to xp.sum being called within our call to xp.mean? We don't call xp.sum directly for PyTorch or NumPy as it uses xp.linalg.

Copy link
Contributor

@mdhaber mdhaber Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, then, not even NumPy converts to float64. You wrote that "For default torch input (float32), directional_stats returns float64. Is that okay?" and I was trying to explain why that might be the case, but it doesn't look like that happens. I guess I didn't actually test for NumPy before, but it doesn't seem to happpen for NumPy either because - as you say - NumPy skips sum, so data-apis/array-api-compat#152 doesn't come up.

I think you only have to force the dtype of expected because data starts out as a float64 array, so Torch's input is a float64 tensor. If we use xp.asarray instead of np.array, it should work without specifying the dtype.

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too much to say here. Thanks!

Comment on lines 3032 to 3037
full_array = xp.asarray(np.tile(data, (2, 2, 2, 1)))
expected = xp.asarray([[[1., 0., 0.],
[1., 0., 0.]],
[[1., 0., 0.],
[1., 0., 0.]]],
dtype=xp.float64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with either, personally. I think there is value in doing some or most tests with default floating point type, but there is also some value in testing with non-default types. If that happens naturally by convenience in some tests, great.

[skip cirrus] [skip circle]
Copy link
Member

@lucascolley lucascolley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please let me know how we can fix the regex failure in CI

Comment on lines 3032 to 3037
full_array = xp.asarray(np.tile(data, (2, 2, 2, 1)))
expected = xp.asarray([[[1., 0., 0.],
[1., 0., 0.]],
[[1., 0., 0.],
[1., 0., 0.]]],
dtype=xp.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only forcing the dtype of expected. For default torch input (float32), directional_stats returns float64. Is that okay?

Same thing for the next test function.

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after these changes!

@mdhaber mdhaber requested a review from j-bowhay July 5, 2024 06:43
Comment on lines 558 to 569
if SCIPY_ARRAY_API and hasattr(xp, 'linalg'):
return xp.linalg.vector_norm(x, axis=axis, keepdims=keepdims, ord=ord)

else:
if ord != 2:
raise ValueError(
"only the Euclidean norm (`ord=2`) is currently supported in "
"`xp_vector_norm` for backends not implementing the `linalg` extension."
)
# return (x @ x)**0.5
# or to get the right behavior with nd, complex arrays
return xp.sum(xp.conj(x) * x, axis=axis, keepdims=keepdims)**0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite fully understand this. If SCIPY_ARRAY_API=0 then xp.sum(xp.conj(x) * x, axis=axis, keepdims=keepdims)**0.5 is always used which doesn't seem preferable compared to np.linalg.norm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think I should rework the logic to

if SCIPY_ARRAY_API:
    if hasattr(xp, linalg):
        ...
    else:
        ...
else:
    # use np.linalg.norm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be fixed now

[skip cirrus] [skip circle]
@lucascolley
Copy link
Member

please squash merge!

Copy link
Member

@j-bowhay j-bowhay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone!

@dschmitz89
Copy link
Contributor Author

Thanks everyone for pushing this in my absence. :)

ev-br pushed a commit to ev-br/scipy that referenced this pull request Jul 14, 2024
Co-authored-by: Lucas Colley <lucas.colley8@gmail.com>
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Co-authored-by: Jake Bowhay <60778417+j-bowhay@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants