-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Closed
Description
Background: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#sparse-data-structure-refactor
Summary: As of 0.24 sparse datatypes in pandas dataframes are represented like Sparse[int64, 0], which xgboost does not recognize as numeric.
Detail: In core._maybe_pandas_data() there is a loop that checks whether each column's dtype.name is in a hardcoded set of dtypes (
xgboost/python-package/xgboost/core.py
Line 220 in cd1526d
PANDAS_DTYPE_MAPPER = {'int8': 'int', 'int16': 'int', 'int32': 'int', 'int64': 'int', |
Possible solutions:
1 - Look at the dtype.name or dtype.subtype depending on whether the name starts with "Sparse"
2 - Instead of enumerating the acceptable datatypes, check each column with pd.api.types.is_numeric_dtype() and .is_bool_dtype()
g3rfx
Metadata
Metadata
Assignees
Labels
No labels