Handle literal in meta_from_array #6731

pentschev · 2020-10-12T22:10:39Z

pentschev · 2020-10-12T22:11:34Z

mathause · 2020-10-12T22:34:30Z

Thanks! This does seem to fix the issue (I tested locally but did not run the xarray test suite over it). Testing for the error message may be brittle, though (but I also don't know how to do this better).

pentschev · 2020-10-13T07:45:51Z

Testing for the error message may be brittle, though (but I also don't know how to do this better).

I generally agree with that, but it seems that such issues have no better ways to be handled. Due to the lack of a more concise way to do it, we already have similar handling in

dask/dask/array/utils.py

Lines 128 to 134 in c631d32

    
           except TypeError as e: 
        
               if ( 
        
                   "unexpected keyword argument" in str(e) 
        
                   or "is an invalid keyword for" in str(e) 
        
                   or "Did not understand the following kwargs" in str(e) 
        
               ): 
        
                   raise

With the above said, I'm happy to change the proposed solution for something more resilient if anyone can suggest an alternative.

TomAugspurger · 2020-10-13T11:04:40Z

Agreed with everyone here that this seems brittle but I don't immediately see a better way to handle it.

mathause · 2020-10-13T18:35:42Z

I just tried the xarray tests with this branch and this fixes 4 of our 5 failures. I don't understand the remaining failure, yet.

jakirkham · 2020-10-14T03:41:04Z

What if we just special case when x is a str?

Admittedly I don't have a good handle on the use case here. More context would be very welcome.

pentschev · 2020-10-14T08:19:02Z

I just tried the xarray tests with this branch and this fixes 4 of our 5 failures. I don't understand the remaining failure, yet.

Please keep us posted.

What if we just special case when x is a str?

Who is x in your suggestion?

mathause · 2020-10-14T14:26:44Z

I found the problem - with this workaround the dtype of da.zeros_like(da.array("bar"), dtype=bool) becomes '<U3':

import numpy as np
import dask.array as da

bar = da.zeros_like(da.array("bar"), dtype=bool)

bar.dtype
# returns dtype('<U3')

bar.compute().dtype
# returns bool

For empty string and float it is correct

da.zeros_like(da.array(""), dtype=bool).dtype
# returns bool
da.zeros_like(da.array([1.2, 3.4]), dtype=bool).dtype
# returns bool

which is why 4 of the 5 tests pass. As for the use case - we try to compare

bar & np.array(False)

which then returns

TypeError: operand type(s) all returned NotImplemented from __array_ufunc__(<ufunc 'bitwise_and'>, '__call__', dask.array<zeros, shape=(), dtype=<U3, chunksize=(), chunktype=numpy.ndarray>, array(False)): 'Array', 'ndarray'

mathause · 2020-10-14T16:19:29Z

Do I understand correctly the meta.astype(dtype) is there because meta is not necessarily a numpy array? So we want to ensure it does not change its type? (Else we could just do meta=np.array([], dtype=dtype)?).

The problem stems from if meta == np.array("") in your code because:

da.array("jjj")._meta == np.array("")
# -> array(False)
da.array(["jjj"])._meta == np.array("")
# -> array([], dtype=bool)
da.array("")._meta == np.array("")
# -> array(True)

Given you already test for str(e) in ["invalid literal", "could not convert string to float"] - is the additional test necessary? If yes you can probably make it more general by doing meta.dtype.kind == "U". Also you could combine the two tests like so

if any(...) and meta.dtype.kind == "U"

to ensure the error is raised.

jakirkham · 2020-10-14T17:39:10Z

What if we just special case when x is a str?

Who is x in your suggestion?

x is the name of the variable meta_from_array takes

pentschev · 2020-10-14T18:36:48Z

Do I understand correctly the meta.astype(dtype) is there because meta is not necessarily a numpy array? So we want to ensure it does not change its type? (Else we could just do meta=np.array([], dtype=dtype)?).

meta was introduced because we need to ensure we know what type the output array will have before computing it, and that may not be a NumPy array. The .astype(dtype) is part of the mechanism to ensure the dtype is also correct, each operation may have different requirements regarding to dtype and that's there to ensure meta will have the correct one. In summary, meta tries to infer the array's output type, its dtype and ndim.

Given you already test for str(e) in ["invalid literal", "could not convert string to float"] - is the additional test necessary?

For the case at hand you're correct the test isn't strictly necessary. However, I've been taking the approach of not generalizing fixes, meaning if this exception exists in a completely unrelated case (e.g., a non-literal array that may happen to raise the same error message) we won't just assume we're handling it correctly when we probably aren't. This helps prevent silent errors that are much harder to be found in the future.

If yes you can probably make it more general by doing meta.dtype.kind == "U".

Yes, I think this is a better approach than comparing against np.array(""), thanks for the suggestion. I'll just pushed a fix, could you try it out and let us know if that resolves all issues?

pentschev · 2020-10-14T18:39:43Z

What if we just special case when x is a str?

Who is x in your suggestion?

x is the name of the variable meta_from_array takes

In the case above x isn't necessarily a str, but anything that happens to be converted into a NumPy literal. So we actually need to check the final meta for it, which I believe we now do successfully with 6026297 .

mathause

Thanks a lot for all the detailed explanations!

I left some suggestions. Turns out this also needs to be done for (byte-)strings ("S").

mathause · 2020-10-14T20:08:50Z

dask/array/utils.py

+                        ]
+                    ]
+                )
+                and meta.dtype.kind == "U"


Suggested change

and meta.dtype.kind == "U"

and meta.dtype.kind in "SU"

to handle the case

da.zeros_like(da.array(b"u"), dtype=bool).compute()

mathause · 2020-10-14T20:17:03Z

dask/array/tests/test_array_utils.py

+    assert meta_from_array("").dtype == np.array("").dtype
+    assert meta_from_array("", dtype="bool").dtype == np.array([], dtype="bool").dtype
+    assert meta_from_array("", dtype="int").dtype == np.array([], dtype="int").dtype
+    assert meta_from_array("", dtype="float").dtype == np.array([], dtype="float").dtype


Could you also test "str", u"", and u"str" here

Are the filled string cases (e.g., "str" and u"str") actually happening in your tests? I'm asking this because I would generally expect meta to be reduced to an empty array (empty literal in this case), if that's not actually happening we may have a different issue.

Nevermind, the test you're suggesting is from before meta getting reduced, so it makes sense.

mathause · 2020-10-14T20:26:01Z

dask/array/utils.py

+                any(
+                    [
+                        s in str(e)
+                        for s in [
+                            "invalid literal",
+                            "could not convert string to float",
+                        ]
+                    ]
+                )


Suggested change

any(

[

s in str(e)

for s in [

"invalid literal",

"could not convert string to float",

]

]

)

any(

s in str(e)

for s in [

"invalid literal",

"could not convert string to float",

]

)

mathause · 2020-10-14T20:27:33Z

dask/array/utils.py

+                    [
+                        s in str(e)
+                        for s in [
+                            "unexpected keyword argument",
+                            "is an invalid keyword for",
+                            "Did not understand the following kwargs",
+                        ]


Suggested change

[

s in str(e)

for s in [

"unexpected keyword argument",

"is an invalid keyword for",

"Did not understand the following kwargs",

]

s in str(e)

for s in [

"unexpected keyword argument",

"is an invalid keyword for",

"Did not understand the following kwargs",

pentschev · 2020-10-14T20:55:36Z

I addressed your suggestions @mathause , could you check if everything works now?

mathause · 2020-10-15T10:04:16Z

Now all xarray tests pass (locally) so this should be good to go from my side.

pentschev · 2020-10-15T13:00:31Z

Thanks @mathause for confirming. From my side it's complete too, appreciate it if @TomAugspurger or @jakirkham have chance to review/merge at some point.

TomAugspurger · 2020-10-15T19:18:59Z

Thanks all!

Handle literal in meta_from_array

e11e8af

pentschev mentioned this pull request Oct 12, 2020

Allow *_like array creation functions to respect input array type (eg: numpy/cupy/etc.) #6680

Merged

2 tasks

mathause mentioned this pull request Oct 12, 2020

workaround for failing dask zeros_like pydata/xarray#4505

Closed

3 tasks

Fix black formatting

2502451

Change string checking for better black formatting

ed6f1c7

Check for meta.dtype.kind == "U" in meta_from_array

6026297

mathause reviewed Oct 14, 2020

View reviewed changes

pentschev added 4 commits October 14, 2020 13:49

Simplified conditions in meta_from_array/compute_meta

e6d77b4

More meta literal tests

3a0703c

Check for bytestrings in meta_from_array

fda330b

Test meta_from_array for bytestrings

4c5dd9e

TomAugspurger merged commit c1382bd into dask:master Oct 15, 2020

kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020

Handle literal in meta_from_array (dask#6731)

1b66f80

pentschev deleted the meta-from-array-literal branch June 30, 2021 12:27

Uh oh!

Handle literal in meta_from_array #6731

Handle literal in meta_from_array #6731

Uh oh!

Conversation

pentschev commented Oct 12, 2020

Uh oh!

pentschev commented Oct 12, 2020

Uh oh!

mathause commented Oct 12, 2020

Uh oh!

pentschev commented Oct 13, 2020

Uh oh!

TomAugspurger commented Oct 13, 2020

Uh oh!

mathause commented Oct 13, 2020

Uh oh!

jakirkham commented Oct 14, 2020

Uh oh!

pentschev commented Oct 14, 2020

Uh oh!

mathause commented Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathause commented Oct 14, 2020

Uh oh!

jakirkham commented Oct 14, 2020

Uh oh!

pentschev commented Oct 14, 2020

Uh oh!

pentschev commented Oct 14, 2020

Uh oh!

mathause left a comment

Choose a reason for hiding this comment

Uh oh!

mathause Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

mathause Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

pentschev Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

pentschev Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

mathause Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

mathause Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

pentschev commented Oct 14, 2020

Uh oh!

mathause commented Oct 15, 2020

Uh oh!

pentschev commented Oct 15, 2020

Uh oh!

TomAugspurger commented Oct 15, 2020

Uh oh!

Uh oh!

mathause commented Oct 14, 2020 •

edited

Loading