Skip to content

[BUG]: Broken pipeline in catboost regressor #3433

@aquinteros

Description

@aquinteros

pycaret version checks

Issue Description

Hello! i checked and didn't find the issue reported before, i'm very sorry if i couldn't find it.

Also i apologize for my broken english.

The pipeline i am working on ran perfecly with last version of pycaret, but when i updated my model to pycaret 3.0.0 it stopped working.

To make it reproducible i will list the basic tasks i did:

I started a new venv in python 3.10.10 locally in a folder and installed the following list of requirements.txt in this order:
pandas
numpy
sqlalchemy
scikit-learn
seaborn
matplotlib
datetime
streamlit
pyodbc
sqlalchemy<2.0
mlflow
xgboost
pycaret[full]

When i ran my pipeline it selected catboost regressor as the estimator by r2, i tuned the model, finalized it and saved it to a subfolder "models"

When i tried to load the model using load_model() it loadad correctly but when i ran the predict_model() function i got this error:

"ValueError: If estimator is not a Pipeline, you must run setup() first."

i ran the same steps above but in stead of using compare_models i used create_model("lightgbm") and it worked just fine with that algorithm which makes me think its related only to catboost regressor.

Reproducible Example

# importar librerias
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

from pycaret.regression import load_model, predict_model
from pycaret.regression import setup, compare_models, tune_model, plot_model, finalize_model, save_model, create_model, evaluate_model

# setup model 
session = setup(
	data = train,
	target = 'Monto_EUR',
	log_experiment = True,
    use_gpu=False,
    session_id=seed,
    normalize = True,
    normalize_method = 'zscore',
    )

# compare models
model = compare_models(exclude=['dummy','ada','en','lar','llar','lasso','rf','et'], sort='r2')

# tune model
model = tune_model(model, optimize='mape', n_iter=100, choose_better=True)

# finalize_model
model = finalize_model(model)

# save_model
save_model(model=model, model_name='models/predictor_model')

# load model
model = load_model('models/predictor_model')

# predict test
predictions = predict_model(estimator=model, data=test)

Expected Behavior

it should create the dataframe with labels

Actual Results

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_13632\4067720024.py in ()
      1 # predict test
----> 2 predictions = predict_model(estimator=model, data=test)

c:\Users\alniquia\OneDrive - Telefonica\Documents\Projects\CalculadoraCostos\env\lib\site-packages\pycaret\regression\functional.py in predict_model(estimator, data, round, verbose)
   1925         experiment = _EXPERIMENT_CLASS()
   1926 
-> 1927     return experiment.predict_model(
   1928         estimator=estimator,
   1929         data=data,

c:\Users\alniquia\OneDrive - Telefonica\Documents\Projects\CalculadoraCostos\env\lib\site-packages\pycaret\regression\oop.py in predict_model(self, estimator, data, round, verbose)
   2219         """
   2220 
-> 2221         return super().predict_model(
   2222             estimator=estimator,
   2223             data=data,

c:\Users\alniquia\OneDrive - Telefonica\Documents\Projects\CalculadoraCostos\env\lib\site-packages\pycaret\internal\pycaret_experiment\supervised_experiment.py in predict_model(self, estimator, data, probability_threshold, encoded_labels, raw_score, round, verbose, ml_usecase, preprocess)
   4925             pipeline.steps = pipeline.steps[:-1]
   4926         elif not self._setup_ran:
-> 4927             raise ValueError(
   4928                 "If estimator is not a Pipeline, you must run setup() first."
   4929             )

ValueError: If estimator is not a Pipeline, you must run setup() first.

Installed Versions

System:
python: 3.10.10 (tags/v3.10.10:aad5f6a, Feb 7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]
executable: c:\Users\alniquia\OneDrive - Telefonica\Documents\Projects\CalculadoraCostos\env\Scripts\python.exe
machine: Windows-10-10.0.19044-SP0

PyCaret required dependencies:
pip: 23.0.1
setuptools: 60.10.0
pycaret: 3.0.0
IPython: 7.34.0
ipywidgets: 7.7.4
tqdm: 4.64.1
numpy: 1.23.5
pandas: 1.5.3
jinja2: 3.1.2
scipy: 1.9.3
joblib: 1.2.0
sklearn: 1.2.2
pyod: 1.0.9
imblearn: 0.10.1
category_encoders: 2.6.0
lightgbm: 3.3.5
numba: 0.56.4
requests: 2.28.2
matplotlib: 3.6.3
scikitplot: 0.3.7
yellowbrick: 1.5
plotly: 5.13.1
kaleido: 0.2.1
statsmodels: 0.13.5
sktime: 0.16.1
tbats: 1.1.2
pmdarima: 2.0.3
psutil: 5.9.4

PyCaret optional dependencies:
shap: 0.41.0
interpret: 0.3.2
umap: 0.5.3
pandas_profiling: 4.1.1
explainerdashboard: 0.4.2.1
autoviz: 0.1.58
fairlearn: 0.7.0
xgboost: 1.7.4
catboost: 1.1.1
kmodes: 0.12.2
mlxtend: 0.21.0
statsforecast: 1.5.0
tune_sklearn: 0.4.5
ray: 2.3.1
hyperopt: 0.2.7
optuna: 3.1.0
skopt: 0.9.0
mlflow: 1.30.0
gradio: 3.23.0
fastapi: 0.95.0
uvicorn: 0.21.1
m2cgen: 0.10.0
evidently: 0.2.7
fugue: 0.8.2
streamlit: 1.20.0
prophet: Not installed

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions