-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
If training data and test data are explicitly specified and target column does not exist in the test_data then
setup(data=data, test_data=test_data, silent=True, target=target)
will throw error only at lower dimensionality.. at higher dimensionality it will just be stuck forever.
Upon further exploration its a single feature v22
that caused the issue. The distribution of this categorical feature is as following.
pd.DataFrame(train.v22.value_counts()).shape (18210, 1)
Upon dropping this feature the setup() completed in 15 sec.
To Reproduce
#!/usr/bin/env python
# coding: utf-8
# In[1]:
from pathlib import PureWindowsPath, Path
from pycaret.classification import *
from pycaret import *
import pandas as pd
Path().resolve().parent
# In[3]:
# source data https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/data
# load X_train, X_test, y_train, y_test
train = pd.read_csv(Path('../../paribas/train.csv'))
test = pd.read_csv(Path('../../paribas/test.csv'))
# In[ ]:
# Throws no error but keeps waiting for ages
model = setup(data=train, target='target', ignore_features = ['ID'], session_id = 123, log_experiment = False, experiment_name = 'test1', silent=True)
# In[4]:
# Throws error appropriately
train = train.iloc[:,0:10]
test = test.iloc[:,0:10]
model = setup(data=train, target='target', ignore_features = ['ID'], session_id = 123, log_experiment = False, experiment_name = 'test1', silent=True)
Expected behavior
Train and test data when explicitly defined should assert the similar structure and must have target column in both frames
Versions
'2.3.1'
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working