-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
BUG: spatial.distance.correlation: fix complex input #21001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
use `cmath` instead of `math` for `sqrt` [skip cirrus] [skip circle]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this post that discusses cosine distance for complex vector. The author suggests something equivalent to 1 - np.dot(u,v)/(np.linalg.norm(u)*np.linalg.norm(v)) This definition appears to agree with the doc. In addition, for the input of the test case, this expression gives the same answer as WolframAlpha linked in the referenced issue. The test case gives a different answer. So it appears to me I guess the original cosine distance code was meant for real vector only. It happened to give an answer for complex vector, but that might not be the expected answer (if we expect an answer equal to WolframAlpha’s). For this reason, I suppose the referenced issue is more an “enhancement” than a “bug”. |
To handle complex input in both the |
In addition, even with complex inner product, subtracting the result from one seems unnatural to me. So is the last line that |
Indeed, what is the point of a distance function returning a complex number? If we were implementing functions that must be mathematical metrics, then we would have to ensure that the functions returned real values. We don't actually require that the distances be metrics, but it seems reasonable to still require that values returned are real. Things like premetrics and semimetrics (that drop some axioms required of a metric) still require that the distance is a real value. I think the two options that we have here are to simply not accept complex vectors (i.e. raise an error on complex input), or implement something that returns a real value. That "something" should not be something we invent ad hoc, but a formula that has some mathematical justification. |
Thanks Warren. We recently deprecated complex input for some of the interpolators. @h-vetinari @j-bowhay do you think a similar strategy sounds reasonable here? |
I found another MathOverflow post that discusses about the angle between complex vectors. One of the answers also mentions a review paper of various possible definitions. My take-away from these materials is that if a real result is expected, there appears to be only one option, which is to treat vectors in If the result is allowed to be complex, then there are a few more possibilities, each finding themselves useful in some places. Btw, the reference issue is about |
The errors observed in correlation(u, v, w=w, centered=False) |
Reference issue
Closes gh-20994
What does this implement/fix?
A regression was introduced for complex input when moving from
np.sqrt
tomath.sqrt
. This changes tocmath.sqrt
.Additional information
I had to include extra handling where the imaginary part is zero - not sure if this is the right way to do things.