Skip to content

Problem with pandas String dtype  #1965

@nhat-le

Description

@nhat-le

Problem: It seems like for string categorical data, CatBoost works with object dtype but not string dtype in pandas columns.
For example, this code runs well:

Xtrain = pd.DataFrame(dict(id_=['abb', 'bcd']),
                      dtype='object')

ytrain = [1,2]
Pool(Xtrain, ytrain, cat_features=[0])

but this throws an error TypeError: Cannot convert StringArray to numpy.ndarray

Xtrain = pd.DataFrame(dict(id_=['abb', 'bcd']),
                      dtype='string')

ytrain = [1,2]
Pool(Xtrain, ytrain, cat_features=[0])

catboost version: 1.0.3
Operating System: MacOS
CPU: i7
GPU: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions