Skip to content

cdist doesn't align DataFrame columns #9616

@MaxGhenis

Description

@MaxGhenis

cdist currently treats DataFrames as arrays, ignoring column names. If passing DataFrames with the same columns but in a different order, this will cause unexpected results. Aligning the columns would be more intuitive IMO.

from scipy.spatial.distance import cdist
df = pd.DataFrame({'a': [0, 0],
                   'b': [1, 1]})
# Zero distance from df to itself.
cdist(df, df)
# array([[0., 0.],
#        [0., 0.]])
# But switching the columns creates distance.
df2 = df[['b', 'a']]
cdist(df, df2)
# array([[1.41421356, 1.41421356],
#        [1.41421356, 1.41421356]])

Scipy/Numpy/Python version information:

import sys, scipy, numpy; print(scipy.__version__, numpy.__version__, sys.version_info)
1.1.0 1.15.4 sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    scipy.spatialwontfixNot actionable, rejected or unplanned changes

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions