Pandas 0.24 sparse restructure breaking _maybe_pandas_data()

Background: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#sparse-data-structure-refactor

Summary: As of 0.24 sparse datatypes in pandas dataframes are represented like Sparse[int64, 0], which xgboost does not recognize as numeric.

Detail: In core._maybe_pandas_data() there is a loop that checks whether each column's dtype.name is in a hardcoded set of dtypes ( https://github.com/dmlc/xgboost/blob/cd1526d3b1155432fca82e8b4895bc0b827b4d29/python-package/xgboost/core.py#L220 ). Because Pandas changed Sparse dtypes to have a name that doesn't match these, the check now fails on any sparse column. What is looked for is now represented in that column's dtype.subtype.

Possible solutions:
1 - Look at the dtype.name or dtype.subtype depending on whether the name starts with "Sparse"
2 - Instead of enumerating the acceptable datatypes, check each column with pd.api.types.is_numeric_dtype() and .is_bool_dtype()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pandas 0.24 sparse restructure breaking _maybe_pandas_data() #4648

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Pandas 0.24 sparse restructure breaking _maybe_pandas_data() #4648

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions