-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
pycaret version checks
-
I have checked that this issue has not already been reported here.
-
I have confirmed this bug exists on the latest version of pycaret.
-
I have confirmed this bug exists on the master branch of pycaret (pip install -U git+https://github.com/pycaret/pycaret.git@master).
Issue Description
When attempting to create an API with create_api
the pydantic data model is composed of entries in experiment.X
and experiment.y
. Therefore the data selection is done per key in case of y
. This leads to a keyerror when the item 0
is not present in the (train-)data's index.
This is already correctly referenced in case of independent features (experiment.X.iloc[0]
).
Current workaround would be to drop the index before training, which is undisirable with later observation identification in mind. Also it is simply not very robust.
Reproducible Example
import random
from pycaret.datasets import get_data
from pycaret.regression import (
compare_models,
create_api,
create_model,
setup,
tune_model,
)
data = get_data("insurance")
# lets say we have customer IDs that we want to as the observations names:
customer_ids = random.sample(range(10000, 99999), len(data))
data["customer_id"] = customer_ids
data.set_index("customer_id", inplace=True)
s = setup(data, target="charges", session_id=123)
best = compare_models(include=["rf", "gbr"])
model = create_model(best)
tuned_model = tune_model(model)
create_api(tuned_model, "trained_models/my_first_api")
Expected Behavior
Creating an API with underlying data that has an index that does not contain the item 0
should also be possible. This is already implemented correctly for the reference to X
-data (.iloc[0]
)
Actual Results
A keyerror occurs, when there is no item named `0` in `experiment.y`
Output:
age sex bmi children smoker region charges
0 19 female 27.900 0 yes southwest 16884.92400
1 18 male 33.770 1 no southeast 1725.55230
2 28 male 33.000 3 no southeast 4449.46200
3 33 male 22.705 0 no northwest 21984.47061
4 32 male 28.880 0 no northwest 3866.85520
Description Value
0 Session id 123
1 Target charges
2 Target type Regression
3 Original data shape (1338, 7)
4 Transformed data shape (1338, 10)
5 Transformed train set shape (936, 10)
6 Transformed test set shape (402, 10)
7 Ordinal features 2
8 Numeric features 3
9 Categorical features 3
10 Preprocess True
11 Imputation type simple
12 Numeric imputation mean
13 Categorical imputation mode
14 Maximum one-hot encoding 25
15 Encoding method None
16 Fold Generator KFold
17 Fold Number 10
18 CPU Jobs -1
19 Use GPU False
20 Log Experiment False
21 Experiment Name reg-default-name
22 USI f227
Model MAE MSE RMSE R2 \
gbr Gradient Boosting Regressor 2701.9919 2.354866e+07 4832.9329 0.8320
rf Random Forest Regressor 2771.4583 2.541650e+07 5028.6343 0.8172
RMSLE MAPE TT (Sec)
gbr 0.4447 0.3137 0.265
rf 0.4690 0.3303 0.413
MAE MSE RMSE R2 RMSLE MAPE
Fold
0 2651.0179 2.031297e+07 4506.9911 0.8787 0.4506 0.3416
1 3047.5028 3.178944e+07 5638.2121 0.8152 0.4560 0.2993
2 2526.0336 2.221968e+07 4713.7761 0.7187 0.4909 0.3007
3 2975.2686 2.309014e+07 4805.2205 0.8072 0.4866 0.3810
4 2847.7050 2.716372e+07 5211.8830 0.7980 0.5103 0.3367
5 2580.5742 1.905227e+07 4364.8910 0.8774 0.3340 0.2437
6 2366.1844 1.924113e+07 4386.4713 0.8691 0.3504 0.2649
7 2671.5877 2.550159e+07 5049.9101 0.8598 0.4414 0.2748
8 2325.6224 1.856430e+07 4308.6309 0.8801 0.3889 0.2888
9 3028.4227 2.855132e+07 5343.3434 0.8161 0.5379 0.4058
Mean 2701.9919 2.354866e+07 4832.9329 0.8320 0.4447 0.3137
Std 250.0988 4.311402e+06 437.5116 0.0488 0.0643 0.0491
Processing: 0%| | 0/7 [00:00<?, ?it/s]Fitting 10 folds for each of 10 candidates, totalling 100 fits
Original model was better than the tuned model, hence it will be returned. NOTE: The display metrics are for the tuned model (not the original one).
MAE MSE RMSE R2 RMSLE MAPE
Fold
0 3385.2881 2.980593e+07 5459.4806 0.8220 0.5659 0.4748
1 3609.4719 3.409537e+07 5839.1241 0.8018 0.5114 0.4118
2 3710.7020 3.587883e+07 5989.8940 0.5457 0.7977 0.5374
3 3845.1139 3.313262e+07 5756.0945 0.7233 0.7606 0.5996
4 3801.3662 3.986850e+07 6314.1507 0.7035 0.6453 0.4905
5 3617.8384 3.086773e+07 5555.8735 0.8014 0.5108 0.3681
6 3353.2437 2.737254e+07 5231.8769 0.8137 0.5943 0.4230
7 3171.2929 2.978108e+07 5457.2044 0.8362 0.7103 0.3373
8 3010.7330 2.390871e+07 4889.6532 0.8456 0.5965 0.5119
9 3850.3856 4.028390e+07 6346.9600 0.7405 0.7465 0.5663
Mean 3535.5436 3.249952e+07 5684.0312 0.7634 0.6439 0.4721
Std 277.9305 4.966506e+06 437.3902 0.0860 0.0991 0.0814
Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/3.10.13/envs/pycaret-development/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2263, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2273, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/projects/pycaret/trained_models/delete-me-experiment.py", line 26, in <module>
create_api(tuned_model, "trained_models/my_first_api")
File "/home/ubuntu/projects/pycaret/pycaret/utils/generic.py", line 965, in wrapper
return func(*args, **kwargs)
File "/home/ubuntu/projects/pycaret/pycaret/regression/functional.py", line 2885, in create_api
return _CURRENT_EXPERIMENT.create_api(
File "/home/ubuntu/projects/pycaret/pycaret/internal/pycaret_experiment/tabular_experiment.py", line 2588, in create_api
output_model = create_model("{api_name}_output", {target}={repr(self.y[0])})
File "/home/ubuntu/.pyenv/versions/3.10.13/envs/pycaret-development/lib/python3.10/site-packages/pandas/core/series.py", line 981, in __getitem__
return self._get_value(key)
File "/home/ubuntu/.pyenv/versions/3.10.13/envs/pycaret-development/lib/python3.10/site-packages/pandas/core/series.py", line 1089, in _get_value
loc = self.index.get_loc(label)
File "/home/ubuntu/.pyenv/versions/3.10.13/envs/pycaret-development/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 0
### Installed Versions
<details>
System:
python: 3.10.13 (main, Nov 7 2023, 10:19:12) [GCC 9.4.0]
executable: /home/ubuntu/.pyenv/versions/pycaret-development/bin/python
machine: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
PyCaret required dependencies:
pip: 23.3.1
setuptools: 65.5.0
pycaret: 3.1.0
IPython: 8.17.2
ipywidgets: 8.1.1
tqdm: 4.66.1
numpy: 1.23.5
pandas: 1.5.3
jinja2: 3.1.2
scipy: 1.10.1
joblib: 1.3.2
sklearn: 1.2.2
pyod: 1.1.1
imblearn: 0.11.0
category_encoders: 2.6.3
lightgbm: 4.1.0
numba: 0.58.1
requests: 2.31.0
matplotlib: 3.6.0
scikitplot: 0.3.7
yellowbrick: 1.5
plotly: 5.18.0
plotly-resampler: Not installed
kaleido: 0.2.1
schemdraw: 0.15
statsmodels: 0.14.0
sktime: 0.21.1
tbats: 1.1.3
pmdarima: 2.0.4
psutil: 5.9.6
markupsafe: 2.1.3
pickle5: Not installed
cloudpickle: 2.2.1
deprecation: 2.1.0
xxhash: 3.4.1
wurlitzer: 3.0.3
PyCaret optional dependencies:
shap: 0.43.0
interpret: 0.4.4
umap: 0.5.4
ydata_profiling: 4.6.0
explainerdashboard: 0.4.3
autoviz: Not installed
fairlearn: 0.7.0
deepchecks: Not installed
xgboost: 2.0.1
catboost: 1.2.2
kmodes: 0.12.2
mlxtend: 0.23.0
statsforecast: 1.5.0
tune_sklearn: 0.5.0
ray: 2.8.0
hyperopt: 0.2.7
optuna: 3.4.0
skopt: 0.9.0
mlflow: 1.30.1
gradio: 3.50.2
fastapi: 0.104.1
uvicorn: 0.24.0.post1
m2cgen: 0.10.0
evidently: 0.2.8
fugue: 0.8.6
streamlit: Not installed
prophet: Not installed
</details>