Skip to content

Catboost crashes when attempting to train a multilabel classifier with embedding features #2249

@mgh1

Description

@mgh1

Problem: Cannot use embedding_features with CatBoostClassifier when the loss_function is MultiLogLoss. This crashes when attempting to fit a model for a multilabel classification task.

Example code to reproduce:

import pandas as pd
import catboost

features_df = pd.DataFrame({
    'f0': [0, 1, 2],
    'embedding1': [[0.0, 1.1, 0.0], [1.0, 2.1, 0.0], [0.11, 0.2, 0.3]]
})
label = [[0, 0], [0, 1], [1, 0]]

pool = catboost.Pool(features_df, label=label, embedding_features=['embedding1'])

model = catboost.CatBoostClassifier(iterations=10, loss_function='MultiLogloss')
model.fit(pool)

Catboost will throw an exception with this error message:

CatBoostError: catboost/libs/data/target.h:315: Attempt to use multi-dimensional target as one-dimensional

catboost version: 1.1.1
Operating System: Linux
CPU: Yes
GPU: Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions