Skip to content

Conversation

pentschev
Copy link
Member

@pentschev pentschev commented Oct 15, 2020

Fixes #4842 , #6117 , #6169 , #6942.

This PR adds support for NEP-35. To describe the NEP briefly, it proposes the inclusion of a new keyword argument like to NumPy array creation functions, such as np.array, allowing NumPy to dispatch the array creation downstream to other libraries, such as CuPy and Dask itself. To demonstrate this, consider the following example:

>>> import numpy as np, cupy as cp, dask.array as da
>>>
>>> cp_ref = cp.array(0)
>>> da_ref = da.array(0)
>>>
>>> type(np.array(da.arange(5)))
<class 'numpy.ndarray'>
>>> type(np.array(da.arange(5), like=da_ref))
<class 'dask.array.core.Array'>
>>>
>>> type(np.array(cp.arange(5)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cupy/core/core.pyx", line 1245, in cupy.core.core.ndarray.__array__
TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.
>>> type(np.array(cp.arange(5), like=cp_ref))
<class 'cupy.core.core.ndarray'>

NEP-35 is still experimental and requires NumPy 1.20.0+.

The changes here attempt to cover a large portion of Dask, adding some new tests that wouldn't pass before. This change is mostly beneficial for downstream libraries such as CuPy and Sparse, that implement the __array_function__ protocol, allowing Dask to continue working with, for example, Dask arrays backed by CuPy in functions such as pad(..., mode="constant"), which wasn't possible before because of the call to np.asarray that wasn't able to create a new array in a type other than NumPy.

Things that are not yet supported include (but not limited to):

  • da.random.choice: it's a bit of complex function which will take some time to get right, but doesn't seem to have any technical limitations;
  • linalg functions: requires solve_triangular which is a SciPy function and can't be dispatched with __array_function__, thus will probably require special casing;

cc @quasiben @jakirkham @mrocklin @seberg @shoyer @rgommers @hameerabbasi @leofang

@quasiben
Copy link
Member

Yay tests pass 🎉 Thank you for continuing to push here @pentschev!

Is it fair to characterize most of the changes here as moving the type checks from isinstance(arr, list, np.ndarray) to is_arraylike(arr) and updating safety function calls to use the like= arg:

asarray_safe(c, like=meta_from_array(array), dtype=result.dtype)

I think this is a large enough change that we should ask for an external reviewer if folks have time. Ideally we could merge in tomorrow but that might not be enough lead time. If we don't hear back I think planning for early next week makes sense.

@pentschev
Copy link
Member Author

pentschev commented Feb 11, 2021

Is it fair to characterize most of the changes here as moving the type checks from isinstance(arr, list, np.ndarray) to is_arraylike(arr) and updating safety function calls to use the like= arg:

asarray_safe(c, like=meta_from_array(array), dtype=result.dtype)

For the first part, no. Take for example https://github.com/dask/dask/pull/6738/files#diff-cc8bb1c7ea02a9582c31ac1580f3ab0674c35e0b6af060513ba9932ceb07a600R227, that's a place where support for non-arraylike is still necessary, thus that has to be verified on a per-case basis.

For the second part, yes. The intent is that *array_safe should be backwards compatible too, much like what has been done with empty_like_safe and its counterparts, rather than having devs writing np.array*(arr, like=some_arr) and then checking whether it succeeds (NumPy 1.20+) or fails for older NumPy. I'm not currently aware of any cases for which *array_safe doesn't reproduce a backwards-compatible behavior, but if there are any, I'm confident we can fix that such that using *array_safe should be preferred over np.array*.

assert_eq(result, np.array([1], dtype=d.dtype))


@pytest.mark.skipif(np.__version__ < "1.20", reason="NEP-35 is not available")
Copy link
Contributor

@hameerabbasi hameerabbasi Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this into a global variable, perhaps in a settings-like module. That way it's easy to change the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, create a variable named NEP35_AVAILABLE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this exists already in

_numpy_120 = LooseVersion(np.__version__) >= "1.20.0"
which should suffice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in bae0dd6

Copy link
Contributor

@hameerabbasi hameerabbasi Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest testing the protocol rather than the version. Something like:

def _test_nep35():
    class MyClass:
        def __array_function__(self, func, args, kwargs):
            return True
    try:
        return np.asarray([], like=MyClass())
    except TypeError:
        return False

NEP35_AVAILABLE = _test_nep35()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally prefer they fail to force us to update the tests should anything change, rather than having tests getting skipped. Happy to change my mindset should there be compelling arguments in favor of skipping. 🙂

@quasiben
Copy link
Member

I think @crusaderky may take a look at this -- if not, I'll plan to merge tomorrow morning

@jakirkham
Copy link
Member

I’ve been meaning to take a look at this as well. Will carve out some time tomorrow

@quasiben
Copy link
Member

I'm going to merge now. If there are additional concerns I'm sure @pentschev would be happy to follow up with PRs/comments/etc.

@quasiben quasiben merged commit 5948ec3 into dask:master Feb 18, 2021
@seberg
Copy link

seberg commented Feb 18, 2021

Small heads-up: we missed some small memory leaks in np.empty and np.zeros when like= is used, those should be gone in 1.20.2, but in case someone notices...

@jakirkham
Copy link
Member

Sorry for the delay. Skimmed over the changes. These seem reasonable. No additional comments

@jakirkham
Copy link
Member

Would it makes sense to do a blogpost on this? Like a short follow on/update of the NEP 18 one?

@pentschev
Copy link
Member Author

Thanks @seberg , is numpy/numpy@e907085 the fix you mentioned?

@seberg
Copy link

seberg commented Feb 19, 2021

Yeah, it just leaks "shape" information, so thats a couple a few dozen bytes each call (don't worry about the dtype, in the vast majority of cases those are singletons anyway).

@pentschev
Copy link
Member Author

Thanks @seberg for catching and fixing that, it's my fault to have missed it! 🙁

@pentschev
Copy link
Member Author

Would it makes sense to do a blogpost on this? Like a short follow on/update of the NEP 18 one?

I'm not sure, possibly? This change is generally more transparent to users than NEP-18 was, so I'm not exactly sure how to approach it in a manner that anybody who's not involved in library development would be compelled to read. No objections in discussing it, I just don't see a good way to write it right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dask Array's constant pad coerces array to NumPy ndarray
5 participants