Skip to content

Handle literal in meta_from_array #6731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Oct 15, 2020

Conversation

pentschev
Copy link
Member

Attempt at fixing #6680 (comment) .

@pentschev
Copy link
Member Author

cc @mathause

@mathause
Copy link

Thanks! This does seem to fix the issue (I tested locally but did not run the xarray test suite over it). Testing for the error message may be brittle, though (but I also don't know how to do this better).

@pentschev
Copy link
Member Author

Testing for the error message may be brittle, though (but I also don't know how to do this better).

I generally agree with that, but it seems that such issues have no better ways to be handled. Due to the lack of a more concise way to do it, we already have similar handling in

dask/dask/array/utils.py

Lines 128 to 134 in c631d32

except TypeError as e:
if (
"unexpected keyword argument" in str(e)
or "is an invalid keyword for" in str(e)
or "Did not understand the following kwargs" in str(e)
):
raise

With the above said, I'm happy to change the proposed solution for something more resilient if anyone can suggest an alternative.

@TomAugspurger
Copy link
Member

Agreed with everyone here that this seems brittle but I don't immediately see a better way to handle it.

@mathause
Copy link

I just tried the xarray tests with this branch and this fixes 4 of our 5 failures. I don't understand the remaining failure, yet.

@jakirkham
Copy link
Member

What if we just special case when x is a str?

Admittedly I don't have a good handle on the use case here. More context would be very welcome.

@pentschev
Copy link
Member Author

I just tried the xarray tests with this branch and this fixes 4 of our 5 failures. I don't understand the remaining failure, yet.

Please keep us posted.

What if we just special case when x is a str?

Who is x in your suggestion?

@mathause
Copy link

mathause commented Oct 14, 2020

I found the problem - with this workaround the dtype of da.zeros_like(da.array("bar"), dtype=bool) becomes '<U3':

import numpy as np
import dask.array as da

bar = da.zeros_like(da.array("bar"), dtype=bool)

bar.dtype
# returns dtype('<U3')

bar.compute().dtype
# returns bool

For empty string and float it is correct

da.zeros_like(da.array(""), dtype=bool).dtype
# returns bool
da.zeros_like(da.array([1.2, 3.4]), dtype=bool).dtype
# returns bool

which is why 4 of the 5 tests pass. As for the use case - we try to compare

bar & np.array(False)

which then returns

TypeError: operand type(s) all returned NotImplemented from __array_ufunc__(<ufunc 'bitwise_and'>, '__call__', dask.array<zeros, shape=(), dtype=<U3, chunksize=(), chunktype=numpy.ndarray>, array(False)): 'Array', 'ndarray'

@mathause
Copy link

Do I understand correctly the meta.astype(dtype) is there because meta is not necessarily a numpy array? So we want to ensure it does not change its type? (Else we could just do meta=np.array([], dtype=dtype)?).


The problem stems from if meta == np.array("") in your code because:

da.array("jjj")._meta == np.array("")
# -> array(False)
da.array(["jjj"])._meta == np.array("")
# -> array([], dtype=bool)
da.array("")._meta == np.array("")
# -> array(True)

Given you already test for str(e) in ["invalid literal", "could not convert string to float"] - is the additional test necessary? If yes you can probably make it more general by doing meta.dtype.kind == "U". Also you could combine the two tests like so

if any(...) and meta.dtype.kind == "U" 

to ensure the error is raised.

@jakirkham
Copy link
Member

What if we just special case when x is a str?

Who is x in your suggestion?

x is the name of the variable meta_from_array takes

@pentschev
Copy link
Member Author

Do I understand correctly the meta.astype(dtype) is there because meta is not necessarily a numpy array? So we want to ensure it does not change its type? (Else we could just do meta=np.array([], dtype=dtype)?).

meta was introduced because we need to ensure we know what type the output array will have before computing it, and that may not be a NumPy array. The .astype(dtype) is part of the mechanism to ensure the dtype is also correct, each operation may have different requirements regarding to dtype and that's there to ensure meta will have the correct one. In summary, meta tries to infer the array's output type, its dtype and ndim.

Given you already test for str(e) in ["invalid literal", "could not convert string to float"] - is the additional test necessary?

For the case at hand you're correct the test isn't strictly necessary. However, I've been taking the approach of not generalizing fixes, meaning if this exception exists in a completely unrelated case (e.g., a non-literal array that may happen to raise the same error message) we won't just assume we're handling it correctly when we probably aren't. This helps prevent silent errors that are much harder to be found in the future.

If yes you can probably make it more general by doing meta.dtype.kind == "U".

Yes, I think this is a better approach than comparing against np.array(""), thanks for the suggestion. I'll just pushed a fix, could you try it out and let us know if that resolves all issues?

@pentschev
Copy link
Member Author

What if we just special case when x is a str?

Who is x in your suggestion?

x is the name of the variable meta_from_array takes

In the case above x isn't necessarily a str, but anything that happens to be converted into a NumPy literal. So we actually need to check the final meta for it, which I believe we now do successfully with 6026297 .

Copy link

@mathause mathause left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for all the detailed explanations!

I left some suggestions. Turns out this also needs to be done for (byte-)strings ("S").

]
]
)
and meta.dtype.kind == "U"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and meta.dtype.kind == "U"
and meta.dtype.kind in "SU"

to handle the case

da.zeros_like(da.array(b"u"), dtype=bool).compute()

Comment on lines 51 to 54
assert meta_from_array("").dtype == np.array("").dtype
assert meta_from_array("", dtype="bool").dtype == np.array([], dtype="bool").dtype
assert meta_from_array("", dtype="int").dtype == np.array([], dtype="int").dtype
assert meta_from_array("", dtype="float").dtype == np.array([], dtype="float").dtype

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also test "str", u"", and u"str" here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the filled string cases (e.g., "str" and u"str") actually happening in your tests? I'm asking this because I would generally expect meta to be reduced to an empty array (empty literal in this case), if that's not actually happening we may have a different issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, the test you're suggesting is from before meta getting reduced, so it makes sense.

Comment on lines 108 to 116
any(
[
s in str(e)
for s in [
"invalid literal",
"could not convert string to float",
]
]
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
any(
[
s in str(e)
for s in [
"invalid literal",
"could not convert string to float",
]
]
)
any(
s in str(e)
for s in [
"invalid literal",
"could not convert string to float",
]
)

Comment on lines 147 to 153
[
s in str(e)
for s in [
"unexpected keyword argument",
"is an invalid keyword for",
"Did not understand the following kwargs",
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[
s in str(e)
for s in [
"unexpected keyword argument",
"is an invalid keyword for",
"Did not understand the following kwargs",
]
s in str(e)
for s in [
"unexpected keyword argument",
"is an invalid keyword for",
"Did not understand the following kwargs",

@pentschev
Copy link
Member Author

I addressed your suggestions @mathause , could you check if everything works now?

@mathause
Copy link

Now all xarray tests pass (locally) so this should be good to go from my side.

@pentschev
Copy link
Member Author

Thanks @mathause for confirming. From my side it's complete too, appreciate it if @TomAugspurger or @jakirkham have chance to review/merge at some point.

@TomAugspurger TomAugspurger merged commit c1382bd into dask:master Oct 15, 2020
@TomAugspurger
Copy link
Member

Thanks all!

kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020
@pentschev pentschev deleted the meta-from-array-literal branch June 30, 2021 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants