Skip to content

MultiLogloss does not work with text_features #1885

@mmrnustik

Description

@mmrnustik

Problem: Fitting model with text features using MultiLogloss does not work and cryptic error message is shown.
catboost version: 1.0.0
Operating System: Debian GNU/Linux 8 (jessie)
CPU: Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz

import catboost
from catboost import CatBoostClassifier, Pool
from catboost.utils import eval_metric
from sklearn.datasets import make_multilabel_classification, make_classification
from sklearn.model_selection import train_test_split
import pandas as pd

print('catboost version:', catboost.__version__)

X, Y = make_multilabel_classification(n_samples=500, n_features=20, n_classes=5, random_state=0)
X = pd.DataFrame(X)
X['text'] = 'some random text'

X_train, X_test, Y_train, Y_test = train_test_split(X, Y)
train_pool = Pool(X_train, Y_train, text_features=['text'])
test_pool = Pool(X_test, Y_test, text_features=['text'])

clf = CatBoostClassifier(
    loss_function='MultiLogloss',
    iterations=500,
    class_names=['A', 'B', 'C', 'D', 'E'],

)
clf.fit(train_pool, eval_set=test_pool, metric_period=10, verbose=50)
catboost version: 1.0.0
Traceback (most recent call last):
  File "catboost_MultiLogloss_error_minimal_example.py", line 25, in <module>
    clf.fit(train_pool, eval_set=test_pool, metric_period=10, verbose=50)
  File "/home/misa/.pyenv/versions/sat/lib/python3.8/site-packages/catboost/core.py", line 4717, in fit
    self._fit(X, y, cat_features, text_features, embedding_features, None, sample_weight, None, None, None, None, baseline, use_best_model,
  File "/home/misa/.pyenv/versions/sat/lib/python3.8/site-packages/catboost/core.py", line 2037, in _fit
    self._train(
  File "/home/misa/.pyenv/versions/sat/lib/python3.8/site-packages/catboost/core.py", line 1464, in _train
    self._object._train(train_pool, test_pool, params, allow_clear_pool, init_model._object if init_model else None)
  File "_catboost.pyx", line 4389, in _catboost._CatBoost._train
  File "_catboost.pyx", line 4438, in _catboost._CatBoost._train
_catboost.CatBoostError: catboost/libs/data/target.h:308: Attempt to use multidimintional target as one-dimensional

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions